Bug 1368093
Summary: | Remove-brick: Remove-brick start status failed during continuous lookup+directory rename | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Prasad Desala <tdesala> |
Component: | distribute | Assignee: | Susant Kumar Palai <spalai> |
Status: | CLOSED DUPLICATE | QA Contact: | Prasad Desala <tdesala> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | rhgs-3.1 | CC: | amukherj, aspandey, nbalacha, pkarampu, rcyriac, rgowdapp, rhinduja, rhs-bugs, sheggodu, storage-qa-internal, tdesala |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-19 09:00:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Prasad Desala
2016-08-18 11:35:42 UTC
This issue is also seen with distributed replica volume. Tested the same scenario with a pure distribute volume. Rebalance has started and no such error messages as updated in the BZ were seen. The issue was seen with a distributed disperse volume, hence changing the component to disperse. I am able to reproduce this issue consistently on my system. Followed the same steps mentioned in this BZ. As soon as we trigger remove-brick command we see following error on mount point - (Sometimes, remove-brick on first set was successful but on second set it gave the same error.) [root@apandey vol]# /home/apandey/test.sh mv: cannot move ‘dir-841’ to ‘dir-842’: Transport endpoint is not connected mv: cannot stat ‘dir-842’: No such file or directory This error we saw only for rename directory. All the other IO like untar of linux and "ls -lRt" did not give any issue. I talked to Raghavendra Gowdappa about this issue and he doubt on glusterd handling volume file change while some IO is going on. Second issue - Although we saw ENOTCONN on mount point for rename directory, rebalance was started and after sometime it failed and threw assertion. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id rebalance/vol --xlator-opti'. Program terminated with signal SIGABRT, Aborted. #0 0x00007fb1798628d7 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-14.fc21.x86_64 elfutils-libelf-0.163-4.fc21.x86_64 elfutils-libs-0.163-4.fc21.x86_64 glibc-2.20-8.fc21.x86_64 keyutils-libs-1.5.9-4.fc21.x86_64 krb5-libs-1.12.2-19.fc21.x86_64 libcom_err-1.42.12-4.fc21.x86_64 libgcc-4.9.2-6.fc21.x86_64 libselinux-2.3-10.fc21.x86_64 libuuid-2.25.2-3.fc21.x86_64 nss-mdns-0.10-15.fc21.x86_64 openssl-libs-1.0.1k-12.fc21.x86_64 pcre-8.35-14.fc21.x86_64 sssd-client-1.12.5-5.fc21.x86_64 systemd-libs-216-25.fc21.x86_64 xz-libs-5.1.2-14alpha.fc21.x86_64 zlib-1.2.8-7.fc21.x86_64 (gdb) bt #0 0x00007fb1798628d7 in raise () from /lib64/libc.so.6 #1 0x00007fb17986453a in abort () from /lib64/libc.so.6 #2 0x00007fb17985b47d in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007fb17985b532 in __assert_fail () from /lib64/libc.so.6 #4 0x00007fb16cf68d1b in ec_manager_setattr (fop=0x7fb1680b965c, state=4) at ec-inode-write.c:394 #5 0x00007fb16cf4a1da in __ec_manager (fop=0x7fb1680b965c, error=0) at ec-common.c:2283 #6 0x00007fb16cf45bbf in ec_resume (fop=0x7fb1680b965c, error=0) at ec-common.c:289 #7 0x00007fb16cf45de7 in ec_complete (fop=0x7fb1680b965c) at ec-common.c:362 #8 0x00007fb16cf674fb in ec_inode_write_cbk (frame=0x7fb1682a188c, this=0x7fb16801e750, cookie=0x3, op_ret=0, op_errno=0, prestat=0x7fb167efd970, poststat=0x7fb167efd900, xdata=0x7fb15c16051c) at ec-inode-write.c:65 #9 0x00007fb16cf68816 in ec_setattr_cbk (frame=0x7fb1682a188c, cookie=0x3, this=0x7fb16801e750, op_ret=0, op_errno=0, prestat=0x7fb167efd970, poststat=0x7fb167efd900, xdata=0x7fb15c16051c) at ec-inode-write.c:349 #10 0x00007fb16d20168f in client3_3_setattr_cbk (req=0x7fb15c2ab06c, iov=0x7fb15c2ab0ac, count=1, myframe=0x7fb15c1f4c0c) at client-rpc-fops.c:2264 #11 0x00007fb17af583aa in rpc_clnt_handle_reply (clnt=0x7fb168067ed0, pollin=0x7fb15c017b40) at rpc-clnt.c:790 #12 0x00007fb17af58903 in rpc_clnt_notify (trans=0x7fb168068330, mydata=0x7fb168067f00, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fb15c017b40) at rpc-clnt.c:961 #13 0x00007fb17af54b7b in rpc_transport_notify (this=0x7fb168068330, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fb15c017b40) at rpc-transport.c:541 #14 0x00007fb1705a4d0d in socket_event_poll_in (this=0x7fb168068330) at socket.c:2265 #15 0x00007fb1705a525c in socket_event_handler (fd=17, idx=8, data=0x7fb168068330, poll_in=1, poll_out=0, poll_err=0) at socket.c:2395 #16 0x00007fb17b1fa579 in event_dispatch_epoll_handler (event_pool=0x1c48ff0, event=0x7fb167efdea0) at event-epoll.c:571 #17 0x00007fb17b1fa959 in event_dispatch_epoll_worker (data=0x7fb16803ded0) at event-epoll.c:674 #18 0x00007fb179fdf52a in start_thread () from /lib64/libpthread.so.0 #19 0x00007fb17992e22d in clone () from /lib64/libc.so.6 Although assertion was coming from EC, I think this is also related to the issue described above. Given there is an upstream patch http://review.gluster.org/#/c/15846 posted, moving this bug to POST. Clearing Needinfo. |