Bug 861342

Summary: glusterd hangs while rebalance command issued
Product: [Community] GlusterFS Reporter: vpshastry <vshastry>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, kparthas, nsathyan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 13:12:55 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
Included log files, strace none

Description vpshastry 2012-09-28 05:00:07 EDT
Created attachment 618500 [details]
Included log files, strace

Description of problem:
glusterd hangs while re-balance command started on distributed-stripe volume.

Version-Release number of selected component (if applicable):
master

How reproducible:
May not be reproducible on all machines.

Steps to Reproduce:

1. Create & start a distributed stripe volume
2. Mount & add some files
3. Add bricks & initiate rebalance
  
Actual results:
gluster volume rebalance doesn't succeed.

Expected results:
gluster volume rebalance should succeed.

Additional info:
strace, backtrace, logs and volume info files are attached.

Backtrace:
(gdb) bt
#0  0x00007ffaa29ae88d in waitpid () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007ffaa303c60b in runner_end_reuse (runner=0x7fff558c1790) at run.c:345
#2  0x00007ffaa303c750 in runner_run_generic (rfin=0x7ffaa303c5e0 <runner_end_reuse>, 
    runner=0x7fff558c1790) at run.c:386
#3  runner_run_reuse (runner=0x7fff558c1790) at run.c:417
#4  0x00007ffa9f25a277 in glusterd_handle_defrag_start (volinfo=0x235e2c0, 
    op_errstr=<optimized out>, len=<optimized out>, cmd=1, cbk=0) at glusterd-rebalance.c:253
#5  0x00007ffa9f25afb7 in glusterd_op_rebalance (dict=0x7ffaa0e87748, op_errstr=0x7fff558c8198, 
    rsp_dict=<optimized out>) at glusterd-rebalance.c:542
#6  0x00007ffa9f2314ee in glusterd_op_commit_perform (op=<optimized out>, dict=0x7ffaa0e87748, 
    op_errstr=0x7fff558c8198, rsp_dict=0x0) at glusterd-op-sm.c:3021
#7  0x00007ffa9f233346 in glusterd_op_ac_send_commit_op (event=<optimized out>, ctx=<optimized out>)
    at glusterd-op-sm.c:2325
#8  0x00007ffa9f230233 in glusterd_op_sm () at glusterd-op-sm.c:4602
#9  0x00007ffa9f25aafd in glusterd_handle_defrag_volume (req=0x7ffa9f19402c)
    at glusterd-rebalance.c:435
#10 0x00007ffaa2dd9be5 in rpcsvc_handle_rpc_call (svc=0x2353b60, trans=<optimized out>, 
    msg=<optimized out>) at rpcsvc.c:535
#11 0x00007ffaa2dda0f3 in rpcsvc_notify (trans=0x237eac0, mydata=<optimized out>, 
    event=<optimized out>, data=0x23992b0) at rpcsvc.c:633
#12 0x00007ffaa2ddd3f7 in rpc_transport_notify (this=<optimized out>, event=<optimized out>, 
    data=<optimized out>) at rpc-transport.c:495
#13 0x00007ffa9ef890a4 in socket_event_poll_in (this=0x237eac0) at socket.c:1986
#14 0x00007ffa9ef89814 in socket_event_handler (fd=<optimized out>, idx=<optimized out>, 
    data=0x237eac0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2098
#15 0x00007ffaa3044e43 in event_dispatch_epoll_handler (i=<optimized out>, events=0x235c4d0, 
    event_pool=0x234eea0) at event-epoll.c:384
#16 event_dispatch_epoll (event_pool=0x234eea0) at event-epoll.c:445
#17 0x00000000004049d1 in main (argc=3, argv=0x7fff558c8538) at glusterfsd.c:1883
(gdb)


Volume info:
type=1
count=6
status=1
sub_count=2
stripe_count=2
replica_count=1
version=11
transport-type=0
volume-id=01cc6f21-859d-4da1-b041-3f68c1987022
username=b956693c-5d9e-4bc3-86e7-42f9aac726ec
password=5fabdb6e-683e-4e7b-aebc-0a6b08c6d3e5
performance.io-cache=off
diagnostics.brick-log-level=DEBUG
diagnostics.client-log-level=DEBUG
performance.io-thread-count=1
performance.quick-read=off
performance.write-behind=off
performance.read-ahead=off
brick-0=vpshastry:-export1-b1
brick-1=vpshastry:-export2-b1
brick-2=vpshastry:-export1-b2
brick-3=vpshastry:-export2-b2
brick-4=vpshastry:-export1-b3
brick-5=vpshastry:-export2-b3

rebalance log was empty.
Comment 1 krishnan parthasarathi 2012-10-11 05:07:29 EDT
http://review.gluster.com/4024 fixes the issue on master.