Bug 823256 - Crash in rebalance while running kernel untar and bonnie on nfs mount
Crash in rebalance while running kernel untar and bonnie on nfs mount
Status: CLOSED DUPLICATE of bug 826584
Product: GlusterFS
Classification: Community
Component: core (Show other bugs)
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: shishir gowda
Depends On:
  Show dependency treegraph
Reported: 2012-05-20 07:31 EDT by shylesh
Modified: 2013-12-08 20:32 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-06-07 05:02:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
rebalance logs (1.77 MB, application/x-gzip)
2012-05-20 07:32 EDT, shylesh
no flags Details

  None (edit)
Description shylesh 2012-05-20 07:31:28 EDT
Description of problem:
Running kernel untar and bonnie on the nfs mount point and started rebalance, after some time bought down one of the child from distributed-replicate volume and bought up again. after sometime rebalance process crashed

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. created a distribute-replicate volume 2x2
2. On the nfs mount started kernel untarring
3. on the second mount of the same volume started bonnie
4. Bought down the first child of first pair  in the volume
Actual results:
After sometime rebalance process was crashed

Expected results:

Additional info:

Program terminated with signal 6, Aborted.
#0  0x0000003739032885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.9.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.4.x86_64 zlib-1.2.3-27.el6.x86_64

(gdb) bt
#0  0x0000003739032885 in raise () from /lib64/libc.so.6
#1  0x0000003739034065 in abort () from /lib64/libc.so.6
#2  0x000000373906f977 in __libc_message () from /lib64/libc.so.6
#3  0x0000003739075296 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f0a92e79fbc in __gf_free (free_ptr=0x1fd6b20) at mem-pool.c:258
#5  0x00007f0a8e3b1418 in gf_defrag_start_crawl (data=0x1fbf3b0) at dht-rebalance.c:1494
#6  0x00007f0a92e89a96 in synctask_wrap (old_task=0x7f0a74200d70) at syncop.c:120
#7  0x0000003739043610 in ?? () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()
Comment 1 shylesh 2012-05-20 07:32:20 EDT
Created attachment 585627 [details]
rebalance logs
Comment 2 Amar Tumballi 2012-05-20 23:25:16 EDT
Went through the log:

[2012-05-20 10:24:44.623944] E [dht-rebalance.c:1369:gf_defrag_fix_layout] 0-dis-rep-dht: Fix layout failed for /run6157
[2012-05-20 10:24:44.624088] I [dht-rebalance.c:1614:gf_defrag_status_get] 0-glusterfs: Rebalance is completed
[2012-05-20 10:24:44.624102] I [dht-rebalance.c:1617:gf_defrag_status_get] 0-glusterfs: Files migrated: 165350080, size: 218959092, lookups: 159521, failures: 26899
[2012-05-20 10:24:44.668152] W [glusterfsd.c:816:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x37390e5ccd] (-->/lib64/libpthread.so.0() [0x37398077f1] (-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xfc) [0x407ac2]))) 0-: received signum (15), shutting down
pending frames:
pending frames:

patchset: git://git.gluster.com/glusterfs.git
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
signal received: 6
time of crash: 2012-05-20 10:24:44
configuration details:
argp 1

So this is case of 'rebalance stop' getting triggered, and a race between other thread freeing up mem-pools and stuff.

As this is in the 'cleanup_and_exit()' part, I am reducing the priority.
Comment 3 shishir gowda 2012-06-07 05:02:02 EDT

*** This bug has been marked as a duplicate of bug 826584 ***

Note You need to log in before you can comment on or make changes to this bug.