Bug 862981

Summary: [RHEV-RHS] Crash in rebalance process
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: shylesh <shmohan>
Component: glusterfsAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED CURRENTRELEASE QA Contact: shylesh <shmohan>
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: grajaiya, rhs-bugs, rwheeler, surs, vbellur, vbhat
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.3.0rhsvirt1-7.el6rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-10 07:46:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 875076    
Bug Blocks:    

Description shylesh 2012-10-04 05:56:59 UTC
Description of problem:
After rebalncing with glusterd restart there was a crash

Version-Release number of selected component (if applicable):
[root@rhs-gp-srv4 core]# rpm -qa | grep gluster
glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-6.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-devel-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch


 


Steps to Reproduce:
1. created a single brick distribute volume, which was serving as VM store
2. added one more brick and started rebalance
3. while rebalance is happening restarted glusterd
  
Actual results:
Crash of rebalance process
 


Additional info:
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id rebal --xlator-option *dht.use-re'.
Program terminated with signal 6, Aborted.
#0  0x0000003910a32885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.12.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.5.x86_64 zlib-1.2.3-27.el6.x86_64


bt
====
(gdb) bt
#0  0x0000003910a32885 in raise () from /lib64/libc.so.6
#1  0x0000003910a34065 in abort () from /lib64/libc.so.6
#2  0x0000003910a6f977 in __libc_message () from /lib64/libc.so.6
#3  0x0000003910a75296 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f5bb2faa707 in gf_defrag_start_crawl (data=<value optimized out>) at dht-rebalance.c:1499
#5  0x000000397ee4bd72 in synctask_wrap (old_task=<value optimized out>) at syncop.c:120
#6  0x0000003910a43610 in ?? () from /lib64/libc.so.6
#7  0x0000000000000000 in ?? ()



(gdb) f 4 
#4  0x00007f5bb2faa707 in gf_defrag_start_crawl (data=<value optimized out>) at dht-rebalance.c:1499
1499                    GF_FREE (defrag);
(gdb) l
1494                    defrag->is_exiting = 1;
1495            }
1496            UNLOCK (&defrag->lock);
1497
1498            if (defrag)
1499                    GF_FREE (defrag);
1500
1501            return ret;
1502    }
1503