| Summary: | Inode number changes on a directory when one of subvolumes is down in replicate | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Sachidananda Urs <sac> |
| Component: | distribute | Assignee: | Amar Tumballi <amarts> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.1.3 | CC: | amarts, gluster-bugs, rabhat, vbhat, vraman |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | RTP | Mount Type: | --- |
| Documentation: | DNR | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Amar Tumballi
2011-03-24 02:32:21 UTC
got dev/ino changed error in a distributed replicate without even bringing any of the bricks down with master. Was running dbench -t 18000 100. /bin/rm: cannot remove directory `./clients/client71/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: `./clients/client58' changed dev/ino /bin/rm: cannot remove directory `./clients/client29/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client51/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client97/~dmtmp/EXCEL': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client39/~dmtmp/WORDPRO': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client70/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client2/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: `./clients/client0' changed dev/ino /bin/rm: cannot remove directory `./clients/client67/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client80/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client65/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client16/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: `./clients/client80' changed dev/ino /bin/rm: cannot remove directory `./clients/client72/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client10/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client1/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client32/~dmtmp/EXCEL': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client79/~dmtmp/EXCEL': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client76/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client7/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client37/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client57/~dmtmp/EXCEL': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client27/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client94/~dmtmp/EXCEL': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client49/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: `./clients/client76' changed dev/ino /bin/rm: cannot remove directory `./clients/client92/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client97/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: `./clients/client49' changed dev/ino /bin/rm: cannot remove directory `./clients/client41/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client39/~dmtmp/EXCEL': Transport endpoint is not connected /bin/rm: `./clients/client92' changed dev/ino /bin/rm: cannot remove directory `./clients/client79/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client32/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client94/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client57/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: cannot remove directory `./clients/client39/~dmtmp/ACCESS': Transport endpoint is not connected /bin/rm: `./clients/client57' changed dev/ino None of the servers are down even though there have been errors saying Transport endpoint is not connected. Below is the log message indicating some disconnections. [2011-03-22 20:56:07.903495] I [client3_1-fops.c:1262:client3_1_finodelk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected [2011-03-22 20:56:07.903601] E [rpc-clnt.c:197:call_bail] 0-mirror-client-0: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x14587898x sent = 2011-03-22 20:25:58.588610. timeout = 1800 [2011-03-22 20:56:07.903615] I [client3_1-fops.c:1262:client3_1_finodelk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected [2011-03-22 20:56:07.903740] E [rpc-clnt.c:197:call_bail] 0-mirror-client-1: bailing out frame type(GlusterFS 3.1) op(FLUSH(15)) xid = 0x12163006x sent = 2011-03-22 20:25:58.735613. timeout = 1800 [2011-03-22 20:56:07.903770] I [client3_1-fops.c:734:client3_1_flush_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected [2011-03-22 20:56:07.903994] D [afr-lk-common.c:410:transaction_lk_op] 0-mirror-replicate-0: lk op is for a transaction [2011-03-22 20:56:07.904200] E [rpc-clnt.c:197:call_bail] 0-mirror-client-1: bailing out frame type(GlusterFS 3.1) op(FXATTROP(34)) xid = 0x12163005x sent = 2011-03-22 20:25:58.735381. timeout = 1800 [2011-03-22 20:56:07.904321] E [rpc-clnt.c:197:call_bail] 0-mirror-client-1: bailing out frame type(GlusterFS 3.1) op(SETXATTR(17)) xid = 0x12162989x sent = 2011-03-22 20:25:58.731882. timeout = 1800 [2011-03-22 20:56:07.904375] I [client3_1-fops.c:818:client3_1_setxattr_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected [2011-03-22 20:56:07.904500] D [afr-lk-common.c:410:transaction_lk_op] 0-mirror-replicate-0: lk op is for a transaction [2011-03-22 20:56:07.904558] E [rpc-clnt.c:197:call_bail] 0-mirror-client-1: bailing out frame type(GlusterFS 3.1) op(UNLINK(5)) xid = 0x12162985x sent = 2011-03-22 20:25:58.731004. timeout = 1800 [2011-03-22 20:56:07.904580] I [client3_1-fops.c:502:client3_1_unlink_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected [2011-03-22 20:56:07.908438] D [afr-transaction.c:980:afr_post_nonblocking_inodelk_cbk] 0-mirror-replicate-1: Non blocking inodelks done. Proceeding to FOP [2011-03-22 20:56:07.909301] D [client-lk.c:408:delete_granted_locks_owner] 0-mirror-client-3: Number of locks cleared=0 [2011-03-22 20:56:07.909388] D [client3_1-fops.c:724:client3_1_flush_cbk] 0-mirror-client-3: deleting locks of owner (12927619011809998649) returned 0 [2011-03-22 20:56:07.909418] D [client-lk.c:408:delete_granted_locks_owner] 0-mirror-client-2: Number of locks cleared=0 [2011-03-22 20:56:07.909488] D [client3_1-fops.c:724:client3_1_flush_cbk] 0-mirror-client-2: deleting locks of owner (12927619011809998649) returned 0 [2011-03-22 20:56:07.909504] D [afr-lk-common.c:410:transaction_lk_op] 0-mirror-replicate-1: lk op is for a transaction [2011-03-22 20:56:07.909700] D [afr-lk-common.c:410:transaction_lk_op] 0-mirror-replicate-0: lk op is for a transaction [2011-03-22 20:56:07.910444] D [client-lk.c:442:delete_granted_locks_fd] 0-mirror-client-2: Number of locks cleared=0 [2011-03-22 20:56:07.910674] D [client-lk.c:442:delete_granted_locks_fd] 0-mirror-client-3: Number of locks cleared=0 [2011-03-22 20:56:07.910790] D [afr-lk-common.c:987:afr_lock_blocking] 0-mirror-replicate-0: we're done locking [2011-03-22 20:56:07.910806] D [afr-transaction.c:1054:afr_post_blocking_rename_cbk] 0-mirror-replicate-0: Blocking entrylks done. Proceeding to FOP [2011-03-22 20:56:07.932553] E [rpc-clnt.c:197:call_bail] 0-mirror-client-1: bailing out frame type(GlusterFS 3.1) op(FLUSH(15)) xid = 0x12162983x sent = 2011-03-22 20:25:58.730629. timeout = 1800 [2011-03-22 20:56:07.932576] I [client3_1-fops.c:734:client3_1_flush_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected Unable to delete a directory tree when one of the nodes are down in a distributed+replicate setup. Setup: 16 nodes with 8x2 replicate + distribute setup. Bring down one of the nodes and try to remove a directory with directories in it. Directory deletion fails with changed dev/ino. http://patches.gluster.com/patch/6704/ http://patches.gluster.com/patch/6636/ fixes bug in master branch... fix is also committed in release-3.1 branch. Now I can delete the directory tree from the mount point. I created tree with 10 width and 4 depth. Now from the mount point I can delete them. [root@FC-3 mnt]# ls 00000000000000 00000000000001 00000000000002 00000000000003 00000000000004 00000000000005 00000000000006 00000000000007 00000000000008 00000000000009 filegen.py [root@FC-3 mnt]# rm -rf 00000000000002/ [root@FC-3 mnt]# ls 00000000000000 00000000000001 00000000000003 00000000000004 00000000000005 00000000000006 00000000000007 00000000000008 00000000000009 filegen.py |