Bug 810488 - Crash in glusterfs while rebalance and remove-brick triggered at the same time
Summary: Crash in glusterfs while rebalance and remove-brick triggered at the same time
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: pre-release
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-04-06 11:46 UTC by shylesh
Modified: 2015-12-01 16:45 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-17 09:43:11 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
rebalance logs (476.21 KB, application/x-gzip)
2012-04-06 11:46 UTC, shylesh
no flags Details

Description shylesh 2012-04-06 11:46:08 UTC
Created attachment 575716 [details]
rebalance logs

Description of problem:
while rebalancing is happening , I/O on the mount point and initiating remove-brick on the same volume leads to crash.

Version-Release number of selected component (if applicable):
3.3.0qa33

How reproducible:


Steps to Reproduce:
1. created a distribute volume with 6 bricks
2. Initiated rebalance 
3. Do some I/O on the mount point while rebalance is happening
4. Initiate remove-brick on the same volume
  
Actual results:
glusterfs crashed

Expected results:
remove-brick should not start while rebalance is happening

Additional info:
Program terminated with signal 11, Segmentation fault.
#0  0x0000003d7b0157f8 in ?? () from /lib64/libgcc_s.so.1
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64


==================================================
(gdb) bt
#0  0x0000003d7b0157f8 in ?? () from /lib64/libgcc_s.so.1
#1  0x00007fba83a74933 in xlator_notify (xl=0x2274fb0, event=6, data=0x226ee10) at xlator.c:457
#2  0x00007fba83a87dda in default_notify (this=0x226ee10, event=6, data=0x0) at defaults.c:1334
#3  0x00007fba7f578c7c in client_rpc_notify (rpc=0x23a1c40, mydata=0x226ee10, event=RPC_CLNT_DISCONNECT, 
    data=0x0) at client.c:2107
#4  0x00007fba8384fe2b in rpc_clnt_notify (trans=0x23b16a0, mydata=0x23a1c70, event=RPC_TRANSPORT_DISCONNECT, 
    data=0x23b16a0) at rpc-clnt.c:887
#5  0x00007fba8384bee4 in rpc_transport_notify (this=0x23b16a0, event=RPC_TRANSPORT_DISCONNECT, 
    data=0x23b16a0) at rpc-transport.c:498
#6  0x00007fba803cd1d3 in socket_event_poll_err (this=0x23b16a0) at socket.c:694
#7  0x00007fba803d188c in socket_event_handler (fd=13, idx=6, data=0x23b16a0, poll_in=1, poll_out=0, 
    poll_err=16) at socket.c:1808
#8  0x00007fba83aa8640 in event_dispatch_epoll_handler (event_pool=0x223adb0, events=0x2268830, i=0)
    at event.c:794
#9  0x00007fba83aa8863 in event_dispatch_epoll (event_pool=0x223adb0) at event.c:856
#10 0x00007fba83aa8bee in event_dispatch (event_pool=0x223adb0) at event.c:956
#11 0x000000000040801c in main (argc=21, argv=0x7fffb6c96b88) at glusterfsd.c:1650
===========================================================================
attached the logs

Comment 1 shylesh 2012-04-08 08:36:10 UTC
Same crash also generated while 

1. create a 2 brick distribute from a single node
2. Keep on doing some I/O on the mount
3. Attach another node to the cluster 
4. Add a brick from the new node to this volume
5. Initiate fix-layout and rebalance .

Comment 2 shylesh 2012-04-08 08:46:41 UTC
Same steps lead to another crash whose stack frames are almost same.



Program terminated with signal 11, Segmentation fault.
#0  0x00007fcd94f82da4 in default_notify (this=0x2502040, event=6, data=0x24ff840) at defaults.c:1333
1333                            if (parent->xlator->init_succeeded)
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64

======================================================================
(gdb) p this
$1 = (xlator_t *) 0x2502040
(gdb) p this->ctx
$2 = (glusterfs_ctx_t *) 0x24b4010
(gdb) p this->ctx->master
$3 = (void *) 0x0
(gdb) p this->graph
$4 = (glusterfs_graph_t *) 0x24fa420
====================================================================

(gdb) bt
#0  0x00007fcd94f82da4 in default_notify (this=0x2502040, event=6, data=0x24ff840) at defaults.c:1333
#1  0x00007fcd90842759 in dht_notify (this=0x2502040, event=6, data=0x24ff840) at dht-common.c:4703
#2  0x00007fcd90853bed in notify (this=0x2502040, event=6, data=0x24ff840) at dht.c:201
#3  0x00007fcd94f6f933 in xlator_notify (xl=0x2502040, event=6, data=0x24ff840) at xlator.c:457
#4  0x00007fcd94f82dda in default_notify (this=0x24ff840, event=6, data=0x0) at defaults.c:1334
#5  0x00007fcd90a73c7c in client_rpc_notify (rpc=0x2579d70, mydata=0x24ff840, event=RPC_CLNT_DISCONNECT, 
    data=0x0) at client.c:2107
#6  0x00007fcd94d4ae2b in rpc_clnt_notify (trans=0x25897d0, mydata=0x2579da0, event=RPC_TRANSPORT_DISCONNECT, 
    data=0x25897d0) at rpc-clnt.c:887
#7  0x00007fcd94d46ee4 in rpc_transport_notify (this=0x25897d0, event=RPC_TRANSPORT_DISCONNECT, 
    data=0x25897d0) at rpc-transport.c:498
#8  0x00007fcd918c81d3 in socket_event_poll_err (this=0x25897d0) at socket.c:694
#9  0x00007fcd918cc88c in socket_event_handler (fd=9, idx=4, data=0x25897d0, poll_in=1, poll_out=0, 
    poll_err=16) at socket.c:1808
#10 0x00007fcd94fa3640 in event_dispatch_epoll_handler (event_pool=0x24cbdb0, events=0x24f9800, i=0)
    at event.c:794
#11 0x00007fcd94fa3863 in event_dispatch_epoll (event_pool=0x24cbdb0) at event.c:856
#12 0x00007fcd94fa3bee in event_dispatch (event_pool=0x24cbdb0) at event.c:956
#13 0x000000000040801c in main (argc=21, argv=0x7fff8844c918) at glusterfsd.c:1650

Comment 3 shishir gowda 2012-04-17 06:02:04 UTC
Can you please check if this bug is still valid?

Comment 4 shylesh 2012-04-17 09:43:11 UTC
This bug is not reproducible on latest master .


Note You need to log in before you can comment on or make changes to this bug.