849301 – rebalance process crash

Bug 849301 - rebalance process crash

Summary: rebalance process crash

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	shishir gowda
QA Contact:	shylesh
Docs Contact:
URL:
Whiteboard:
Depends On:	821148
Blocks:	858467
TreeView+	depends on / blocked

Reported:	2012-08-18 02:24 UTC by Vidya Sakar
Modified:	2013-12-09 01:33 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.4.0qa5-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:	821148
Clones:	858467 (view as bug list)
Environment:
Last Closed:	2013-09-23 22:33:07 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Vidya Sakar 2012-08-18 02:24:05 UTC

+++ This bug was initially created as a clone of Bug #821148 +++

Created attachment 584004 [details]
rebalance log

Description of problem:
------------------------

Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id vol --xlator-option *dht.us'.
Program terminated with signal 6, Aborted.
#0  0x0000003271e32885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x0000003271e32885 in raise () from /lib64/libc.so.6
#1  0x0000003271e34065 in abort () from /lib64/libc.so.6
#2  0x0000003271e2b9fe in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003271e2bac0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f22f25fad6e in __inode_path (inode=0x7f22e54700e0, name=0x0, bufp=0x7fffaa45b278) at inode.c:1090
#5  0x00007f22f25fb156 in inode_path (inode=0x7f22e54700e0, name=0x0, bufp=0x7fffaa45b278) at inode.c:1191
#6  0x00007f22ee05ab1d in protocol_client_reopendir (this=0x10b1570, fdctx=0x11ba490) at client-handshake.c:1096
#7  0x00007f22ee05b358 in client_post_handshake (frame=0x7f22f120c0b8, this=0x10b1570) at client-handshake.c:1281
#8  0x00007f22ee05bb98 in client_setvolume_cbk (req=0x7f22e685b04c, iov=0x7f22e685b08c, count=1, myframe=0x7f22f120c0b8) at client-handshake.c:1439
#9  0x00007f22f23bca48 in rpc_clnt_handle_reply (clnt=0x1139340, pollin=0x11c6060) at rpc-clnt.c:788
#10 0x00007f22f23bcde5 in rpc_clnt_notify (trans=0x1148ec0, mydata=0x1139370, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c6060) at rpc-clnt.c:907
#11 0x00007f22f23b8ec8 in rpc_transport_notify (this=0x1148ec0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c6060) at rpc-transport.c:489
#12 0x00007f22eee8e280 in socket_event_poll_in (this=0x1148ec0) at socket.c:1677
#13 0x00007f22eee8e804 in socket_event_handler (fd=13, idx=8, data=0x1148ec0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1792
#14 0x00007f22f2613de8 in event_dispatch_epoll_handler (event_pool=0x108e500, events=0x10a7fd0, i=0) at event.c:785
#15 0x00007f22f261400b in event_dispatch_epoll (event_pool=0x108e500) at event.c:847
#16 0x00007f22f2614396 in event_dispatch (event_pool=0x108e500) at event.c:947
#17 0x0000000000408461 in main (argc=27, argv=0x7fffaa45b938) at glusterfsd.c:1674


Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa41

How reproducible:
-------------------


Steps to Reproduce:
---------------------
1.create distribute-replicate volume(2X2). start the volume.
2.create fuse, nfs mounts. 
3.run gfsc1.sh from fuse mount
4.run nfsc1.sh from nfs mount
4.add-brick to the volume
5.start rebalance 
6.status rebalance
7.stop rebalance
8.brink down 2 bricks from each replicate set, so that one brick is online from
each replica set
9.brick back bricks online
10.start force rebalance
11.query rebalance status 
12.stop rebalance

Repeat step8 to step12 3-4 times.

13. kill glusterd on m1
14. start rebalance on m2
15. start glusterd on m1.
16. start rebalance on m1.

repeat step8 to step16 once again. 

17.stop the volume (couldn't stop the volume)
18.killall glusterfs; killall glusterfsd ; killall glusterd (caused the crash)
  
Actual results:
--------------
rebalance process crashed. 


Additional info: Volume info after add-brick
----------------------------------------------

[root@AFR-Server3 ~]# gluster volume info
 
Volume Name: vol
Type: Distributed-Replicate
Volume ID: 87a4d691-3cbc-4d8f-9f59-da6bb4d1fbab
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.16.159.196:/export_b1/dir1
Brick2: 10.16.159.196:/export_c1/dir1
Brick3: 10.16.159.188:/export_b1/dir1
Brick4: 10.16.159.188:/export_c1/dir1
Brick5: 10.16.159.196:/export_d1/dir1
Brick6: 10.16.159.188:/export_d1/dir1
Options Reconfigured:
performance.stat-prefetch: off

--- Additional comment from shwetha.h.panduranga on 2012-05-12 09:55:20 EDT ---

Created attachment 584005 [details]
Backtrace of core

--- Additional comment from shwetha.h.panduranga on 2012-05-12 09:56:14 EDT ---

Created attachment 584010 [details]
Exact steps executed to recreate

Comment 2 shishir gowda 2012-09-26 06:04:13 UTC

This looks like a duplicate of bug 859387. The fix has gone into glusterfs-3.3.0rhsvirt1-6.el6rhs release. It will be available in rhs update-4

Comment 4 senaik 2013-07-04 10:37:11 UTC

Version : 3.4.0.12rhs.beta1-1.el6rhs.x86_64 
=========

Repeated the steps as mentioned in 'steps to reproduce' , there was no rebalance crash reported . 
Marking the bug as 'Verified'

Comment 5 Scott Haines 2013-09-23 22:33:07 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.