Bug 823256 - Crash in rebalance while running kernel untar and bonnie on nfs mount
Summary: Crash in rebalance while running kernel untar and bonnie on nfs mount
Keywords:
Status: CLOSED DUPLICATE of bug 826584
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: pre-release
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-05-20 11:31 UTC by shylesh
Modified: 2013-12-09 01:32 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-07 09:02:02 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
rebalance logs (1.77 MB, application/x-gzip)
2012-05-20 11:32 UTC, shylesh
no flags Details

Description shylesh 2012-05-20 11:31:28 UTC
Description of problem:
Running kernel untar and bonnie on the nfs mount point and started rebalance, after some time bought down one of the child from distributed-replicate volume and bought up again. after sometime rebalance process crashed

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. created a distribute-replicate volume 2x2
2. On the nfs mount started kernel untarring
3. on the second mount of the same volume started bonnie
4. Bought down the first child of first pair  in the volume
 
Actual results:
After sometime rebalance process was crashed

Expected results:


Additional info:
===============

Program terminated with signal 6, Aborted.
#0  0x0000003739032885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.9.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.4.x86_64 zlib-1.2.3-27.el6.x86_64



bt
====
(gdb) bt
#0  0x0000003739032885 in raise () from /lib64/libc.so.6
#1  0x0000003739034065 in abort () from /lib64/libc.so.6
#2  0x000000373906f977 in __libc_message () from /lib64/libc.so.6
#3  0x0000003739075296 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f0a92e79fbc in __gf_free (free_ptr=0x1fd6b20) at mem-pool.c:258
#5  0x00007f0a8e3b1418 in gf_defrag_start_crawl (data=0x1fbf3b0) at dht-rebalance.c:1494
#6  0x00007f0a92e89a96 in synctask_wrap (old_task=0x7f0a74200d70) at syncop.c:120
#7  0x0000003739043610 in ?? () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()

Comment 1 shylesh 2012-05-20 11:32:20 UTC
Created attachment 585627 [details]
rebalance logs

Comment 2 Amar Tumballi 2012-05-21 03:25:16 UTC
Went through the log:

[2012-05-20 10:24:44.623944] E [dht-rebalance.c:1369:gf_defrag_fix_layout] 0-dis-rep-dht: Fix layout failed for /run6157
[2012-05-20 10:24:44.624088] I [dht-rebalance.c:1614:gf_defrag_status_get] 0-glusterfs: Rebalance is completed
[2012-05-20 10:24:44.624102] I [dht-rebalance.c:1617:gf_defrag_status_get] 0-glusterfs: Files migrated: 165350080, size: 218959092, lookups: 159521, failures: 26899
[2012-05-20 10:24:44.668152] W [glusterfsd.c:816:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x37390e5ccd] (-->/lib64/libpthread.so.0() [0x37398077f1] (-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xfc) [0x407ac2]))) 0-: received signum (15), shutting down
pending frames:
pending frames:


patchset: git://git.gluster.com/glusterfs.git
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
signal received: 6
time of crash: 2012-05-20 10:24:44
configuration details:
argp 1


So this is case of 'rebalance stop' getting triggered, and a race between other thread freeing up mem-pools and stuff.

As this is in the 'cleanup_and_exit()' part, I am reducing the priority.

Comment 3 shishir gowda 2012-06-07 09:02:02 UTC

*** This bug has been marked as a duplicate of bug 826584 ***


Note You need to log in before you can comment on or make changes to this bug.