Bug 808402 - rebalance process crashed
rebalance process crashed
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: distribute (Show other bugs)
mainline
Unspecified Unspecified
unspecified Severity medium
: ---
: ---
Assigned To: shishir gowda
Shwetha Panduranga
:
Depends On:
Blocks: 817967
  Show dependency treegraph
 
Reported: 2012-03-30 06:40 EDT by Shwetha Panduranga
Modified: 2015-12-01 11:45 EST (History)
3 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-24 13:37:37 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
rebalance log (349.29 KB, text/x-log)
2012-03-30 06:40 EDT, Shwetha Panduranga
no flags Details

  None (edit)
Description Shwetha Panduranga 2012-03-30 06:40:06 EDT
Created attachment 573949 [details]
rebalance log

Description of problem:
Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id dstore --xlator-option *dht'.
Program terminated with signal 6, Aborted.
#0  0x0000003638632885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt full
#0  0x0000003638632885 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003638634065 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x000000363866f7a7 in __libc_message () from /lib64/libc.so.6
No symbol table info available.
#3  0x00000036386750c6 in malloc_printerr () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007fba53136eb9 in client3_1_xattrop_cbk (req=0x7fba4b675a80, iov=0x7fba4b675ac0, count=1, myframe=0x7fba5645a928) at client3_1-fops.c:1711
        frame = 0x7fba5645a928
        dict = 0x19d899c
        rsp = {op_ret = 0, op_errno = 0, dict = {dict_len = 100, dict_val = 0x189cc60 ""}, xdata = {xdata_len = 0, xdata_val = 0x0}}
        ret = 116
        op_errno = 0
        local = 0x1914fd0
        this = 0x1868470
        xdata = 0x0
        __FUNCTION__ = "client3_1_xattrop_cbk"
#5  0x00007fba574029fc in rpc_clnt_handle_reply (clnt=0x1928be0, pollin=0x19e0e10) at rpc-clnt.c:797
        conn = 0x1928c10
        saved_frame = 0x192914c
        ret = 0
        req = 0x7fba4b675a80
        xid = 7774
        __FUNCTION__ = "rpc_clnt_handle_reply"
#6  0x00007fba57402d99 in rpc_clnt_notify (trans=0x1938770, mydata=0x1928c10, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x19e0e10) at rpc-clnt.c:916
        conn = 0x1928c10
        clnt = 0x1928be0
        ret = -1
        req_info = 0x0
        pollin = 0x19e0e10
        tv = {tv_sec = 0, tv_usec = 0}
#7  0x00007fba573fee7c in rpc_transport_notify (this=0x1938770, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x19e0e10) at rpc-transport.c:498
        ret = -1
        __FUNCTION__ = "rpc_transport_notify"
#8  0x00007fba53f83270 in socket_event_poll_in (this=0x1938770) at socket.c:1686
---Type <return> to continue, or q <return> to quit---
        ret = 0
        pollin = 0x19e0e10
#9  0x00007fba53f837f4 in socket_event_handler (fd=11, idx=5, data=0x1938770, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
        this = 0x1938770
        priv = 0x1938b20
        ret = 0
        __FUNCTION__ = "socket_event_handler"
#10 0x00007fba5765a628 in event_dispatch_epoll_handler (event_pool=0x18313a0, events=0x185fb30, i=0) at event.c:794
        event_data = 0x185fb34
        handler = 0x7fba53f835d7 <socket_event_handler>
        data = 0x1938770
        idx = 5
        ret = -1
        __FUNCTION__ = "event_dispatch_epoll_handler"
#11 0x00007fba5765a84b in event_dispatch_epoll (event_pool=0x18313a0) at event.c:856
        events = 0x185fb30
        size = 2
        i = 0
        ret = 2
        __FUNCTION__ = "event_dispatch_epoll"
#12 0x00007fba5765abd6 in event_dispatch (event_pool=0x18313a0) at event.c:956
        ret = -1
        __FUNCTION__ = "event_dispatch"
#13 0x0000000000408057 in main (argc=21, argv=0x7fff603a47c8) at glusterfsd.c:1650
        ctx = 0x1819010
        ret = 0
        __FUNCTION__ = "main"


Version-Release number of selected component (if applicable):
mainline

Steps to Reproduce:
1.create distribute-replicate volume(2X2). start the volume.
2.create fuse, nfs mounts. 
3.start dd in loop on both nfs, fuse mount.
4.add-brick to the volume
5.start rebalance 
6.stop rebalance
7.brink down bricks one from each replicate pair. 
8.brink back the bricks online.
9.force start rebalance
10.brick down bricks one from each replicate pair. 
11.query for rebalance status.

Actual results:
crash in rebalance process

Additional Info:-
before crash rebalance status:-
-----------------------------
[03/30/12 - 20:56:54 root@APP-SERVER1 ~]# gluster volume rebalance dstore status
                                    Node Rebalanced-files          size       scanned         status
                               ---------      -----------   -----------   -----------   ------------
                               localhost               36    937426944         1198   in progress
                            192.168.2.36                0            0          257     completed


after crash rebalance status:-
------------------------------
[03/30/12 - 20:57:04 root@APP-SERVER1 ~]# gluster volume rebalance dstore status

                                    Node Rebalanced-files          size       scanned         status
                               ---------      -----------   -----------   -----------   ------------
                               localhost               36    937426944         1198     completed
                            192.168.2.36                0            0          257     completed
Comment 1 Anand Avati 2012-04-03 01:40:14 EDT
CHANGE: http://review.gluster.com/3062 (dht/rebalance: Send PARENT_DOWN event before cleanup in rebalance) merged in master by Vijay Bellur (vijay@gluster.com)
Comment 2 Anand Avati 2012-04-09 09:23:20 EDT
CHANGE: http://review.gluster.com/3107 (rebalance: revert sending PARENT_DOWN event to xlators) merged in master by Vijay Bellur (vijay@gluster.com)
Comment 3 Shwetha Panduranga 2012-05-24 07:39:40 EDT
Bug is fixed. Verified on 3.3.0qa43

Note You need to log in before you can comment on or make changes to this bug.