Bug 808402 - rebalance process crashed
Summary: rebalance process crashed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: shishir gowda
QA Contact: Shwetha Panduranga
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-30 10:40 UTC by Shwetha Panduranga
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:37:37 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
rebalance log (349.29 KB, text/x-log)
2012-03-30 10:40 UTC, Shwetha Panduranga
no flags Details

Description Shwetha Panduranga 2012-03-30 10:40:06 UTC
Created attachment 573949 [details]
rebalance log

Description of problem:
Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id dstore --xlator-option *dht'.
Program terminated with signal 6, Aborted.
#0  0x0000003638632885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt full
#0  0x0000003638632885 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003638634065 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x000000363866f7a7 in __libc_message () from /lib64/libc.so.6
No symbol table info available.
#3  0x00000036386750c6 in malloc_printerr () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007fba53136eb9 in client3_1_xattrop_cbk (req=0x7fba4b675a80, iov=0x7fba4b675ac0, count=1, myframe=0x7fba5645a928) at client3_1-fops.c:1711
        frame = 0x7fba5645a928
        dict = 0x19d899c
        rsp = {op_ret = 0, op_errno = 0, dict = {dict_len = 100, dict_val = 0x189cc60 ""}, xdata = {xdata_len = 0, xdata_val = 0x0}}
        ret = 116
        op_errno = 0
        local = 0x1914fd0
        this = 0x1868470
        xdata = 0x0
        __FUNCTION__ = "client3_1_xattrop_cbk"
#5  0x00007fba574029fc in rpc_clnt_handle_reply (clnt=0x1928be0, pollin=0x19e0e10) at rpc-clnt.c:797
        conn = 0x1928c10
        saved_frame = 0x192914c
        ret = 0
        req = 0x7fba4b675a80
        xid = 7774
        __FUNCTION__ = "rpc_clnt_handle_reply"
#6  0x00007fba57402d99 in rpc_clnt_notify (trans=0x1938770, mydata=0x1928c10, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x19e0e10) at rpc-clnt.c:916
        conn = 0x1928c10
        clnt = 0x1928be0
        ret = -1
        req_info = 0x0
        pollin = 0x19e0e10
        tv = {tv_sec = 0, tv_usec = 0}
#7  0x00007fba573fee7c in rpc_transport_notify (this=0x1938770, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x19e0e10) at rpc-transport.c:498
        ret = -1
        __FUNCTION__ = "rpc_transport_notify"
#8  0x00007fba53f83270 in socket_event_poll_in (this=0x1938770) at socket.c:1686
---Type <return> to continue, or q <return> to quit---
        ret = 0
        pollin = 0x19e0e10
#9  0x00007fba53f837f4 in socket_event_handler (fd=11, idx=5, data=0x1938770, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
        this = 0x1938770
        priv = 0x1938b20
        ret = 0
        __FUNCTION__ = "socket_event_handler"
#10 0x00007fba5765a628 in event_dispatch_epoll_handler (event_pool=0x18313a0, events=0x185fb30, i=0) at event.c:794
        event_data = 0x185fb34
        handler = 0x7fba53f835d7 <socket_event_handler>
        data = 0x1938770
        idx = 5
        ret = -1
        __FUNCTION__ = "event_dispatch_epoll_handler"
#11 0x00007fba5765a84b in event_dispatch_epoll (event_pool=0x18313a0) at event.c:856
        events = 0x185fb30
        size = 2
        i = 0
        ret = 2
        __FUNCTION__ = "event_dispatch_epoll"
#12 0x00007fba5765abd6 in event_dispatch (event_pool=0x18313a0) at event.c:956
        ret = -1
        __FUNCTION__ = "event_dispatch"
#13 0x0000000000408057 in main (argc=21, argv=0x7fff603a47c8) at glusterfsd.c:1650
        ctx = 0x1819010
        ret = 0
        __FUNCTION__ = "main"


Version-Release number of selected component (if applicable):
mainline

Steps to Reproduce:
1.create distribute-replicate volume(2X2). start the volume.
2.create fuse, nfs mounts. 
3.start dd in loop on both nfs, fuse mount.
4.add-brick to the volume
5.start rebalance 
6.stop rebalance
7.brink down bricks one from each replicate pair. 
8.brink back the bricks online.
9.force start rebalance
10.brick down bricks one from each replicate pair. 
11.query for rebalance status.

Actual results:
crash in rebalance process

Additional Info:-
before crash rebalance status:-
-----------------------------
[03/30/12 - 20:56:54 root@APP-SERVER1 ~]# gluster volume rebalance dstore status
                                    Node Rebalanced-files          size       scanned         status
                               ---------      -----------   -----------   -----------   ------------
                               localhost               36    937426944         1198   in progress
                            192.168.2.36                0            0          257     completed


after crash rebalance status:-
------------------------------
[03/30/12 - 20:57:04 root@APP-SERVER1 ~]# gluster volume rebalance dstore status

                                    Node Rebalanced-files          size       scanned         status
                               ---------      -----------   -----------   -----------   ------------
                               localhost               36    937426944         1198     completed
                            192.168.2.36                0            0          257     completed

Comment 1 Anand Avati 2012-04-03 05:40:14 UTC
CHANGE: http://review.gluster.com/3062 (dht/rebalance: Send PARENT_DOWN event before cleanup in rebalance) merged in master by Vijay Bellur (vijay)

Comment 2 Anand Avati 2012-04-09 13:23:20 UTC
CHANGE: http://review.gluster.com/3107 (rebalance: revert sending PARENT_DOWN event to xlators) merged in master by Vijay Bellur (vijay)

Comment 3 Shwetha Panduranga 2012-05-24 11:39:40 UTC
Bug is fixed. Verified on 3.3.0qa43


Note You need to log in before you can comment on or make changes to this bug.