Bug 764316 (GLUSTER-2584)

Summary: Inode number changes on a directory when one of subvolumes is down in replicate
Product: [Community] GlusterFS Reporter: Sachidananda Urs <sac>
Component: distributeAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.3CC: amarts, gluster-bugs, rabhat, vbhat, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: ---
Documentation: DNR CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Amar Tumballi 2011-03-24 02:32:21 UTC
*** Bug 2587 has been marked as a duplicate of this bug. ***

Comment 1 Raghavendra Bhat 2011-03-24 02:36:03 UTC
got dev/ino changed error in a distributed replicate without even bringing any of the bricks down with master.


Was running dbench -t 18000 100.



/bin/rm: cannot remove directory `./clients/client71/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: `./clients/client58' changed dev/ino
/bin/rm: cannot remove directory `./clients/client29/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client51/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client97/~dmtmp/EXCEL': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client39/~dmtmp/WORDPRO': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client70/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client2/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: `./clients/client0' changed dev/ino
/bin/rm: cannot remove directory `./clients/client67/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client80/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client65/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client16/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: `./clients/client80' changed dev/ino
/bin/rm: cannot remove directory `./clients/client72/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client10/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client1/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client32/~dmtmp/EXCEL': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client79/~dmtmp/EXCEL': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client76/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client7/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client37/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client57/~dmtmp/EXCEL': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client27/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client94/~dmtmp/EXCEL': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client49/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: `./clients/client76' changed dev/ino
/bin/rm: cannot remove directory `./clients/client92/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client97/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: `./clients/client49' changed dev/ino
/bin/rm: cannot remove directory `./clients/client41/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client39/~dmtmp/EXCEL': Transport endpoint is not connected
/bin/rm: `./clients/client92' changed dev/ino
/bin/rm: cannot remove directory `./clients/client79/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client32/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client94/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client57/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: cannot remove directory `./clients/client39/~dmtmp/ACCESS': Transport endpoint is not connected
/bin/rm: `./clients/client57' changed dev/ino



None of the servers are down even though there have been errors saying Transport endpoint is not connected.


Below is the log message indicating some disconnections.



[2011-03-22 20:56:07.903495] I [client3_1-fops.c:1262:client3_1_finodelk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected
[2011-03-22 20:56:07.903601] E [rpc-clnt.c:197:call_bail] 0-mirror-client-0: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x14587898x sent = 2011-03-22 20:25:58.588610. timeout = 1800
[2011-03-22 20:56:07.903615] I [client3_1-fops.c:1262:client3_1_finodelk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected
[2011-03-22 20:56:07.903740] E [rpc-clnt.c:197:call_bail] 0-mirror-client-1: bailing out frame type(GlusterFS 3.1) op(FLUSH(15)) xid = 0x12163006x sent = 2011-03-22 20:25:58.735613. timeout = 1800
[2011-03-22 20:56:07.903770] I [client3_1-fops.c:734:client3_1_flush_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected
[2011-03-22 20:56:07.903994] D [afr-lk-common.c:410:transaction_lk_op] 0-mirror-replicate-0: lk op is for a transaction
[2011-03-22 20:56:07.904200] E [rpc-clnt.c:197:call_bail] 0-mirror-client-1: bailing out frame type(GlusterFS 3.1) op(FXATTROP(34)) xid = 0x12163005x sent = 2011-03-22 20:25:58.735381. timeout = 1800
[2011-03-22 20:56:07.904321] E [rpc-clnt.c:197:call_bail] 0-mirror-client-1: bailing out frame type(GlusterFS 3.1) op(SETXATTR(17)) xid = 0x12162989x sent = 2011-03-22 20:25:58.731882. timeout = 1800
[2011-03-22 20:56:07.904375] I [client3_1-fops.c:818:client3_1_setxattr_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected
[2011-03-22 20:56:07.904500] D [afr-lk-common.c:410:transaction_lk_op] 0-mirror-replicate-0: lk op is for a transaction
[2011-03-22 20:56:07.904558] E [rpc-clnt.c:197:call_bail] 0-mirror-client-1: bailing out frame type(GlusterFS 3.1) op(UNLINK(5)) xid = 0x12162985x sent = 2011-03-22 20:25:58.731004. timeout = 1800
[2011-03-22 20:56:07.904580] I [client3_1-fops.c:502:client3_1_unlink_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected
[2011-03-22 20:56:07.908438] D [afr-transaction.c:980:afr_post_nonblocking_inodelk_cbk] 0-mirror-replicate-1: Non blocking inodelks done. Proceeding to FOP
[2011-03-22 20:56:07.909301] D [client-lk.c:408:delete_granted_locks_owner] 0-mirror-client-3: Number of locks cleared=0
[2011-03-22 20:56:07.909388] D [client3_1-fops.c:724:client3_1_flush_cbk] 0-mirror-client-3: deleting locks of owner (12927619011809998649) returned 0
[2011-03-22 20:56:07.909418] D [client-lk.c:408:delete_granted_locks_owner] 0-mirror-client-2: Number of locks cleared=0
[2011-03-22 20:56:07.909488] D [client3_1-fops.c:724:client3_1_flush_cbk] 0-mirror-client-2: deleting locks of owner (12927619011809998649) returned 0
[2011-03-22 20:56:07.909504] D [afr-lk-common.c:410:transaction_lk_op] 0-mirror-replicate-1: lk op is for a transaction
[2011-03-22 20:56:07.909700] D [afr-lk-common.c:410:transaction_lk_op] 0-mirror-replicate-0: lk op is for a transaction
[2011-03-22 20:56:07.910444] D [client-lk.c:442:delete_granted_locks_fd] 0-mirror-client-2: Number of locks cleared=0
[2011-03-22 20:56:07.910674] D [client-lk.c:442:delete_granted_locks_fd] 0-mirror-client-3: Number of locks cleared=0
[2011-03-22 20:56:07.910790] D [afr-lk-common.c:987:afr_lock_blocking] 0-mirror-replicate-0: we're done locking
[2011-03-22 20:56:07.910806] D [afr-transaction.c:1054:afr_post_blocking_rename_cbk] 0-mirror-replicate-0: Blocking entrylks done. Proceeding to FOP
[2011-03-22 20:56:07.932553] E [rpc-clnt.c:197:call_bail] 0-mirror-client-1: bailing out frame type(GlusterFS 3.1) op(FLUSH(15)) xid = 0x12162983x sent = 2011-03-22 20:25:58.730629. timeout = 1800
[2011-03-22 20:56:07.932576] I [client3_1-fops.c:734:client3_1_flush_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected

Comment 2 Sachidananda Urs 2011-03-24 02:46:04 UTC
Unable to delete a directory tree when one of the nodes are down in a distributed+replicate setup.

Setup: 16 nodes with 8x2 replicate + distribute setup. Bring down one of the nodes and try to remove a directory with directories in it.

Directory deletion fails with changed dev/ino.

Comment 3 Amar Tumballi 2011-04-07 00:04:21 UTC
http://patches.gluster.com/patch/6704/
http://patches.gluster.com/patch/6636/

fixes bug in master branch... fix is also committed in release-3.1 branch.

Comment 4 M S Vishwanath Bhat 2011-04-07 05:46:38 UTC
Now I can delete the directory tree from the mount point. I  created tree with 10 width and 4 depth. Now from the mount point I can delete them. 

[root@FC-3 mnt]# ls
00000000000000 00000000000001 00000000000002 00000000000003 00000000000004 00000000000005 00000000000006 00000000000007 00000000000008 00000000000009  filegen.py
[root@FC-3 mnt]# rm -rf 00000000000002/
[root@FC-3 mnt]# ls
00000000000000 00000000000001 00000000000003 00000000000004 00000000000005 00000000000006 00000000000007 00000000000008 00000000000009  filegen.py