Bug 849301 - rebalance process crash
rebalance process crash
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.0
Unspecified Unspecified
urgent Severity high
: ---
: ---
Assigned To: shishir gowda
shylesh
:
Depends On: 821148
Blocks: 858467
  Show dependency treegraph
 
Reported: 2012-08-17 22:24 EDT by Vidya Sakar
Modified: 2013-12-08 20:33 EST (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0qa5-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 821148
: 858467 (view as bug list)
Environment:
Last Closed: 2013-09-23 18:33:07 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vidya Sakar 2012-08-17 22:24:05 EDT
+++ This bug was initially created as a clone of Bug #821148 +++

Created attachment 584004 [details]
rebalance log

Description of problem:
------------------------

Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id vol --xlator-option *dht.us'.
Program terminated with signal 6, Aborted.
#0  0x0000003271e32885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x0000003271e32885 in raise () from /lib64/libc.so.6
#1  0x0000003271e34065 in abort () from /lib64/libc.so.6
#2  0x0000003271e2b9fe in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003271e2bac0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f22f25fad6e in __inode_path (inode=0x7f22e54700e0, name=0x0, bufp=0x7fffaa45b278) at inode.c:1090
#5  0x00007f22f25fb156 in inode_path (inode=0x7f22e54700e0, name=0x0, bufp=0x7fffaa45b278) at inode.c:1191
#6  0x00007f22ee05ab1d in protocol_client_reopendir (this=0x10b1570, fdctx=0x11ba490) at client-handshake.c:1096
#7  0x00007f22ee05b358 in client_post_handshake (frame=0x7f22f120c0b8, this=0x10b1570) at client-handshake.c:1281
#8  0x00007f22ee05bb98 in client_setvolume_cbk (req=0x7f22e685b04c, iov=0x7f22e685b08c, count=1, myframe=0x7f22f120c0b8) at client-handshake.c:1439
#9  0x00007f22f23bca48 in rpc_clnt_handle_reply (clnt=0x1139340, pollin=0x11c6060) at rpc-clnt.c:788
#10 0x00007f22f23bcde5 in rpc_clnt_notify (trans=0x1148ec0, mydata=0x1139370, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c6060) at rpc-clnt.c:907
#11 0x00007f22f23b8ec8 in rpc_transport_notify (this=0x1148ec0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c6060) at rpc-transport.c:489
#12 0x00007f22eee8e280 in socket_event_poll_in (this=0x1148ec0) at socket.c:1677
#13 0x00007f22eee8e804 in socket_event_handler (fd=13, idx=8, data=0x1148ec0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1792
#14 0x00007f22f2613de8 in event_dispatch_epoll_handler (event_pool=0x108e500, events=0x10a7fd0, i=0) at event.c:785
#15 0x00007f22f261400b in event_dispatch_epoll (event_pool=0x108e500) at event.c:847
#16 0x00007f22f2614396 in event_dispatch (event_pool=0x108e500) at event.c:947
#17 0x0000000000408461 in main (argc=27, argv=0x7fffaa45b938) at glusterfsd.c:1674


Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa41

How reproducible:
-------------------


Steps to Reproduce:
---------------------
1.create distribute-replicate volume(2X2). start the volume.
2.create fuse, nfs mounts. 
3.run gfsc1.sh from fuse mount
4.run nfsc1.sh from nfs mount
4.add-brick to the volume
5.start rebalance 
6.status rebalance
7.stop rebalance
8.brink down 2 bricks from each replicate set, so that one brick is online from
each replica set
9.brick back bricks online
10.start force rebalance
11.query rebalance status 
12.stop rebalance

Repeat step8 to step12 3-4 times.

13. kill glusterd on m1
14. start rebalance on m2
15. start glusterd on m1.
16. start rebalance on m1.

repeat step8 to step16 once again. 

17.stop the volume (couldn't stop the volume)
18.killall glusterfs; killall glusterfsd ; killall glusterd (caused the crash)
  
Actual results:
--------------
rebalance process crashed. 


Additional info: Volume info after add-brick
----------------------------------------------

[root@AFR-Server3 ~]# gluster volume info
 
Volume Name: vol
Type: Distributed-Replicate
Volume ID: 87a4d691-3cbc-4d8f-9f59-da6bb4d1fbab
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.16.159.196:/export_b1/dir1
Brick2: 10.16.159.196:/export_c1/dir1
Brick3: 10.16.159.188:/export_b1/dir1
Brick4: 10.16.159.188:/export_c1/dir1
Brick5: 10.16.159.196:/export_d1/dir1
Brick6: 10.16.159.188:/export_d1/dir1
Options Reconfigured:
performance.stat-prefetch: off

--- Additional comment from shwetha.h.panduranga@redhat.com on 2012-05-12 09:55:20 EDT ---

Created attachment 584005 [details]
Backtrace of core

--- Additional comment from shwetha.h.panduranga@redhat.com on 2012-05-12 09:56:14 EDT ---

Created attachment 584010 [details]
Exact steps executed to recreate
Comment 2 shishir gowda 2012-09-26 02:04:13 EDT
This looks like a duplicate of bug 859387. The fix has gone into glusterfs-3.3.0rhsvirt1-6.el6rhs release. It will be available in rhs update-4
Comment 4 senaik 2013-07-04 06:37:11 EDT
Version : 3.4.0.12rhs.beta1-1.el6rhs.x86_64 
=========

Repeated the steps as mentioned in 'steps to reproduce' , there was no rebalance crash reported . 
Marking the bug as 'Verified'
Comment 5 Scott Haines 2013-09-23 18:33:07 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.