Bug 862981 - [RHEV-RHS] Crash in rebalance process
Summary: [RHEV-RHS] Crash in rebalance process
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: unspecified
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Raghavendra Bhat
QA Contact: shylesh
URL:
Whiteboard:
Depends On: 875076
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-10-04 05:56 UTC by shylesh
Modified: 2015-08-10 19:30 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.3.0rhsvirt1-7.el6rhs
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-10 07:46:00 UTC
Embargoed:


Attachments (Terms of Use)

Description shylesh 2012-10-04 05:56:59 UTC
Description of problem:
After rebalncing with glusterd restart there was a crash

Version-Release number of selected component (if applicable):
[root@rhs-gp-srv4 core]# rpm -qa | grep gluster
glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-6.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-devel-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch


 


Steps to Reproduce:
1. created a single brick distribute volume, which was serving as VM store
2. added one more brick and started rebalance
3. while rebalance is happening restarted glusterd
  
Actual results:
Crash of rebalance process
 


Additional info:
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id rebal --xlator-option *dht.use-re'.
Program terminated with signal 6, Aborted.
#0  0x0000003910a32885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.12.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.5.x86_64 zlib-1.2.3-27.el6.x86_64


bt
====
(gdb) bt
#0  0x0000003910a32885 in raise () from /lib64/libc.so.6
#1  0x0000003910a34065 in abort () from /lib64/libc.so.6
#2  0x0000003910a6f977 in __libc_message () from /lib64/libc.so.6
#3  0x0000003910a75296 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f5bb2faa707 in gf_defrag_start_crawl (data=<value optimized out>) at dht-rebalance.c:1499
#5  0x000000397ee4bd72 in synctask_wrap (old_task=<value optimized out>) at syncop.c:120
#6  0x0000003910a43610 in ?? () from /lib64/libc.so.6
#7  0x0000000000000000 in ?? ()



(gdb) f 4 
#4  0x00007f5bb2faa707 in gf_defrag_start_crawl (data=<value optimized out>) at dht-rebalance.c:1499
1499                    GF_FREE (defrag);
(gdb) l
1494                    defrag->is_exiting = 1;
1495            }
1496            UNLOCK (&defrag->lock);
1497
1498            if (defrag)
1499                    GF_FREE (defrag);
1500
1501            return ret;
1502    }
1503


Note You need to log in before you can comment on or make changes to this bug.