821148 – rebalance process crash

Bug 821148 - rebalance process crash

Summary: rebalance process crash

Keywords:
Status:	CLOSED DUPLICATE of bug 826080
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	shishir gowda
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	849301 858467
TreeView+	depends on / blocked

Reported:	2012-05-12 13:54 UTC by Shwetha Panduranga
Modified:	2013-12-09 01:31 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Clones:	849301 (view as bug list)
Environment:
Last Closed:	2012-09-26 05:39:53 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
rebalance log (867.20 KB, text/x-log) 2012-05-12 13:54 UTC, Shwetha Panduranga	no flags	Details
Backtrace of core (4.21 KB, application/octet-stream) 2012-05-12 13:55 UTC, Shwetha Panduranga	no flags	Details
Exact steps executed to recreate (1.45 KB, application/octet-stream) 2012-05-12 13:56 UTC, Shwetha Panduranga	no flags	Details
View All

Description Shwetha Panduranga 2012-05-12 13:54:22 UTC

Created attachment 584004 [details]
rebalance log

Description of problem:
------------------------

Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id vol --xlator-option *dht.us'.
Program terminated with signal 6, Aborted.
#0  0x0000003271e32885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x0000003271e32885 in raise () from /lib64/libc.so.6
#1  0x0000003271e34065 in abort () from /lib64/libc.so.6
#2  0x0000003271e2b9fe in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003271e2bac0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f22f25fad6e in __inode_path (inode=0x7f22e54700e0, name=0x0, bufp=0x7fffaa45b278) at inode.c:1090
#5  0x00007f22f25fb156 in inode_path (inode=0x7f22e54700e0, name=0x0, bufp=0x7fffaa45b278) at inode.c:1191
#6  0x00007f22ee05ab1d in protocol_client_reopendir (this=0x10b1570, fdctx=0x11ba490) at client-handshake.c:1096
#7  0x00007f22ee05b358 in client_post_handshake (frame=0x7f22f120c0b8, this=0x10b1570) at client-handshake.c:1281
#8  0x00007f22ee05bb98 in client_setvolume_cbk (req=0x7f22e685b04c, iov=0x7f22e685b08c, count=1, myframe=0x7f22f120c0b8) at client-handshake.c:1439
#9  0x00007f22f23bca48 in rpc_clnt_handle_reply (clnt=0x1139340, pollin=0x11c6060) at rpc-clnt.c:788
#10 0x00007f22f23bcde5 in rpc_clnt_notify (trans=0x1148ec0, mydata=0x1139370, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c6060) at rpc-clnt.c:907
#11 0x00007f22f23b8ec8 in rpc_transport_notify (this=0x1148ec0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c6060) at rpc-transport.c:489
#12 0x00007f22eee8e280 in socket_event_poll_in (this=0x1148ec0) at socket.c:1677
#13 0x00007f22eee8e804 in socket_event_handler (fd=13, idx=8, data=0x1148ec0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1792
#14 0x00007f22f2613de8 in event_dispatch_epoll_handler (event_pool=0x108e500, events=0x10a7fd0, i=0) at event.c:785
#15 0x00007f22f261400b in event_dispatch_epoll (event_pool=0x108e500) at event.c:847
#16 0x00007f22f2614396 in event_dispatch (event_pool=0x108e500) at event.c:947
#17 0x0000000000408461 in main (argc=27, argv=0x7fffaa45b938) at glusterfsd.c:1674


Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa41

How reproducible:
-------------------


Steps to Reproduce:
---------------------
1.create distribute-replicate volume(2X2). start the volume.
2.create fuse, nfs mounts. 
3.run gfsc1.sh from fuse mount
4.run nfsc1.sh from nfs mount
4.add-brick to the volume
5.start rebalance 
6.status rebalance
7.stop rebalance
8.brink down 2 bricks from each replicate set, so that one brick is online from
each replica set
9.brick back bricks online
10.start force rebalance
11.query rebalance status 
12.stop rebalance

Repeat step8 to step12 3-4 times.

13. kill glusterd on m1
14. start rebalance on m2
15. start glusterd on m1.
16. start rebalance on m1.

repeat step8 to step16 once again. 

17.stop the volume (couldn't stop the volume)
18.killall glusterfs; killall glusterfsd ; killall glusterd (caused the crash)
  
Actual results:
--------------
rebalance process crashed. 


Additional info: Volume info after add-brick
----------------------------------------------

[root@AFR-Server3 ~]# gluster volume info
 
Volume Name: vol
Type: Distributed-Replicate
Volume ID: 87a4d691-3cbc-4d8f-9f59-da6bb4d1fbab
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.16.159.196:/export_b1/dir1
Brick2: 10.16.159.196:/export_c1/dir1
Brick3: 10.16.159.188:/export_b1/dir1
Brick4: 10.16.159.188:/export_c1/dir1
Brick5: 10.16.159.196:/export_d1/dir1
Brick6: 10.16.159.188:/export_d1/dir1
Options Reconfigured:
performance.stat-prefetch: off

Comment 1 Shwetha Panduranga 2012-05-12 13:55:20 UTC

Created attachment 584005 [details]
Backtrace of core

Comment 2 Shwetha Panduranga 2012-05-12 13:56:14 UTC

Created attachment 584010 [details]
Exact steps executed to recreate

Comment 3 shishir gowda 2012-09-26 05:39:53 UTC

This seems to be a duplicate of bug 826080. This bug has been fixed upstream.

*** This bug has been marked as a duplicate of bug 826080 ***

Note You need to log in before you can comment on or make changes to this bug.