Bug 822086 - Crash in rebalance when network goes down
Summary: Crash in rebalance when network goes down
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: pre-release
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: shishir gowda
QA Contact: shylesh
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-05-16 10:07 UTC by shylesh
Modified: 2013-12-09 01:31 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:26:12 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 3.3.0qa43
Embargoed:


Attachments (Terms of Use)
sos report (5.76 MB, application/x-xz)
2012-05-16 10:07 UTC, shylesh
no flags Details

Description shylesh 2012-05-16 10:07:44 UTC
Created attachment 584921 [details]
sos report

Description of problem:
while rebalance is running brought down the network and rebalance process crashed

Version-Release number of selected component (if applicable):
3.3.0qa41

How reproducible:


Steps to Reproduce:
1. created a 2x2  distributed-replicate volume (4 node cluster)
2. filled up with some data so that rebalance takes for a while to finish
3. Add-brick and start rebalance.
4. run "service network stop" on one of the node
  
Actual results:

[root@gqac022 mnt]# gluster volume rebalance giga status
                                    Node Rebalanced-files          size       scanned      failures         status
                               ---------      -----------   -----------   -----------   -----------   ------------
                               localhost               25     25000000         3837            0      completed
                            10.16.157.66               30     30000000         1167            0         failed
                            10.16.157.72               29     29000000         3087            0      completed
                            10.16.157.69               16     16000000         3242            0      completed


The status says failed for that particaular node after node comes back.rebalance process was crashed on the machine.
 
 
Additional info:
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000003ee7e0fe05 in rpc_clnt_submit (rpc=<value optimized out>, prog=0x1c35a00, procnum=5, cbkfn=0, proghdr=0x7f680c000070, 
    proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=0x7f680c000c30, frame=0x7f6876ea8ea4, rsphdr=0x0, rsphdr_count=0, 
    rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1533
#2  0x0000000000407ff8 in mgmt_submit_request ()
#3  0x000000000040cca8 in glusterfs_rebalance_event_notify ()
#4  0x00007f6873812964 in gf_defrag_start_crawl (data=<value optimized out>) at dht-rebalance.c:1486
#5  0x0000003ee7a4b322 in synctask_wrap (old_task=<value optimized out>) at syncop.c:120
#6  0x000000358ea43610 in ?? () from /lib64/libc.so.6
#7  0x0000000000000000 in ?? ()


(gdb) p rsp_iobref
$1 = (struct iobref *) 0x0

attached the sosreport: volume name:- giga
log path: var/log/glusterfs/giga-rebalance.log

Comment 1 shylesh 2012-05-16 11:05:10 UTC
#1  0x0000003ee7e0fe05 in rpc_clnt_submit (rpc=<value optimized out>, prog=0x1c35a00, procnum=5, cbkfn=0, proghdr=0x7f680c000070, 
    proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=0x7f680c000c30, frame=0x7f6876ea8ea4, rsphdr=0x0, rsphdr_count=0, 
    rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1533



(gdb) p cbkfn
$3 = (fop_cbk_fn_t) 0

Comment 2 shishir gowda 2012-05-17 05:39:45 UTC
After rebalance run's to completion, it sends its status across to the local glusterd. This crash happens, as there is a n/w disconnect, as the implementation does not send a cbk_fn.
Setting the priority/severity to medium as the crash happen after rebalance has completed, and there in the case of a n/w bought down.

Comment 3 Anand Avati 2012-05-19 02:28:59 UTC
CHANGE: http://review.gluster.com/3359 (glusterfs/rebalance: Register cbk for glusterfs_rebalance_event_notify) merged in master by Anand Avati (avati)

Comment 4 shylesh 2012-05-25 04:58:51 UTC
Now no crash will happen after now.


Note You need to log in before you can comment on or make changes to this bug.