Bug 821148

Summary: rebalance process crash
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: distributeAssignee: shishir gowda <sgowda>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs, nsathyan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 849301 (view as bug list) Environment:
Last Closed: 2012-09-26 05:39:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 849301, 858467    
Attachments:
Description Flags
rebalance log
none
Backtrace of core
none
Exact steps executed to recreate none

Description Shwetha Panduranga 2012-05-12 13:54:22 UTC
Created attachment 584004 [details]
rebalance log

Description of problem:
------------------------

Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id vol --xlator-option *dht.us'.
Program terminated with signal 6, Aborted.
#0  0x0000003271e32885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x0000003271e32885 in raise () from /lib64/libc.so.6
#1  0x0000003271e34065 in abort () from /lib64/libc.so.6
#2  0x0000003271e2b9fe in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003271e2bac0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f22f25fad6e in __inode_path (inode=0x7f22e54700e0, name=0x0, bufp=0x7fffaa45b278) at inode.c:1090
#5  0x00007f22f25fb156 in inode_path (inode=0x7f22e54700e0, name=0x0, bufp=0x7fffaa45b278) at inode.c:1191
#6  0x00007f22ee05ab1d in protocol_client_reopendir (this=0x10b1570, fdctx=0x11ba490) at client-handshake.c:1096
#7  0x00007f22ee05b358 in client_post_handshake (frame=0x7f22f120c0b8, this=0x10b1570) at client-handshake.c:1281
#8  0x00007f22ee05bb98 in client_setvolume_cbk (req=0x7f22e685b04c, iov=0x7f22e685b08c, count=1, myframe=0x7f22f120c0b8) at client-handshake.c:1439
#9  0x00007f22f23bca48 in rpc_clnt_handle_reply (clnt=0x1139340, pollin=0x11c6060) at rpc-clnt.c:788
#10 0x00007f22f23bcde5 in rpc_clnt_notify (trans=0x1148ec0, mydata=0x1139370, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c6060) at rpc-clnt.c:907
#11 0x00007f22f23b8ec8 in rpc_transport_notify (this=0x1148ec0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c6060) at rpc-transport.c:489
#12 0x00007f22eee8e280 in socket_event_poll_in (this=0x1148ec0) at socket.c:1677
#13 0x00007f22eee8e804 in socket_event_handler (fd=13, idx=8, data=0x1148ec0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1792
#14 0x00007f22f2613de8 in event_dispatch_epoll_handler (event_pool=0x108e500, events=0x10a7fd0, i=0) at event.c:785
#15 0x00007f22f261400b in event_dispatch_epoll (event_pool=0x108e500) at event.c:847
#16 0x00007f22f2614396 in event_dispatch (event_pool=0x108e500) at event.c:947
#17 0x0000000000408461 in main (argc=27, argv=0x7fffaa45b938) at glusterfsd.c:1674


Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa41

How reproducible:
-------------------


Steps to Reproduce:
---------------------
1.create distribute-replicate volume(2X2). start the volume.
2.create fuse, nfs mounts. 
3.run gfsc1.sh from fuse mount
4.run nfsc1.sh from nfs mount
4.add-brick to the volume
5.start rebalance 
6.status rebalance
7.stop rebalance
8.brink down 2 bricks from each replicate set, so that one brick is online from
each replica set
9.brick back bricks online
10.start force rebalance
11.query rebalance status 
12.stop rebalance

Repeat step8 to step12 3-4 times.

13. kill glusterd on m1
14. start rebalance on m2
15. start glusterd on m1.
16. start rebalance on m1.

repeat step8 to step16 once again. 

17.stop the volume (couldn't stop the volume)
18.killall glusterfs; killall glusterfsd ; killall glusterd (caused the crash)
  
Actual results:
--------------
rebalance process crashed. 


Additional info: Volume info after add-brick
----------------------------------------------

[root@AFR-Server3 ~]# gluster volume info
 
Volume Name: vol
Type: Distributed-Replicate
Volume ID: 87a4d691-3cbc-4d8f-9f59-da6bb4d1fbab
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.16.159.196:/export_b1/dir1
Brick2: 10.16.159.196:/export_c1/dir1
Brick3: 10.16.159.188:/export_b1/dir1
Brick4: 10.16.159.188:/export_c1/dir1
Brick5: 10.16.159.196:/export_d1/dir1
Brick6: 10.16.159.188:/export_d1/dir1
Options Reconfigured:
performance.stat-prefetch: off

Comment 1 Shwetha Panduranga 2012-05-12 13:55:20 UTC
Created attachment 584005 [details]
Backtrace of core

Comment 2 Shwetha Panduranga 2012-05-12 13:56:14 UTC
Created attachment 584010 [details]
Exact steps executed to recreate

Comment 3 shishir gowda 2012-09-26 05:39:53 UTC
This seems to be a duplicate of bug 826080. This bug has been fixed upstream.

*** This bug has been marked as a duplicate of bug 826080 ***