Bug 810103 - rebalance process crashed
rebalance process crashed
Status: CLOSED DUPLICATE of bug 808977
Product: GlusterFS
Classification: Community
Component: distribute (Show other bugs)
mainline
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: shishir gowda
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-05 03:25 EDT by Shwetha Panduranga
Modified: 2013-12-08 20:30 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-08 00:35:53 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
rebalance log (860.54 KB, text/x-log)
2012-04-05 08:02 EDT, Shwetha Panduranga
no flags Details

  None (edit)
Description Shwetha Panduranga 2012-04-05 03:25:33 EDT
Description of problem:
(gdb) bt full
#0  0x00000032f1e32885 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00000032f1e34065 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00000032f1e2b9fe in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x00000032f1e2bac0 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007fe315a0e6ab in __gf_free (free_ptr=0x1443400) at mem-pool.c:278
        req_size = 0
        ptr = 0x14433f4 ""
        type = 0
        xl = 0x0
        __PRETTY_FUNCTION__ = "__gf_free"
#5  0x00007fe310ff32e3 in gf_defrag_start_crawl (data=0x1425b00) at dht-rebalance.c:1485
        this = 0x1425b00
        conf = 0x14432f0
        defrag = 0x1443400
        ret = -1
        loc = {path = 0x7fe311032eaf "/", name = 0x0, inode = 0x7fe2d161a04c, parent = 0x0, gfid = '\000' <repeats 15 times>, "\001", 
          pargfid = '\000' <repeats 15 times>}
        iatt = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', 
            owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', 
              write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
          ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}
        parent = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', 
            owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', 
              write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
          ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}
        fix_layout = 0x0
        migrate_data = 0x0
        __FUNCTION__ = "gf_defrag_start_crawl"
#6  0x00007fe315a1f286 in synctask_wrap (old_task=0x7fe2c8200d70) at syncop.c:128
        task = 0x7fe2c8200d70
#7  0x00000032f1e43610 in ?? () from /lib64/libc.so.6
No symbol table info available.


Version-Release number of selected component (if applicable):
3.3.0qa33

gfsc1.sh:-
-----------
#!/bin/bash

mountpoint=`pwd`
for i in {1..10}
do
	level1_dir=$mountpoint/fuse2.$i
	mkdir $level1_dir
	cd $level1_dir
	for j in {1..20}
	do 
		level2_dir=dir.$j
		mkdir $level2_dir
		cd $level2_dir
		for k in {1..100}
		do 
			echo "Creating File: $leve1_dir/$level2_dir/file.$k"
			dd if=/dev/zero of=file.$k bs=1M count=$k 
		done
		cd $level1_dir
	done
	cd $mountpoint
done


nfsc1.sh:-
----------
#!/bin/bash

mountpoint=`pwd`
for i in {1..5}
do 
	level1_dir=$mountpoint/nfs2.$i
	mkdir $level1_dir
	cd $level1_dir
	for j in {1..20}
	do 
		level2_dir=dir.$j
		mkdir $level2_dir
		cd $level2_dir

		for k in {1..100}
		do 
			echo "Creating File: $leve1_dir/$level2_dir/file.$k"
			dd if=/dev/zero of=file.$k bs=1M count=$k
		
		done
		cd $level1_dir
	done
	cd $mountpoint
done

Steps to Reproduce:
1.create distribute-replicate volume(2X3). start the volume.
2.create fuse, nfs mounts. 
3.run gfsc1.sh from fuse mount
4.run nfsc1.sh from nfs mount
4.add-brick to the volume
5.start rebalance 
6.status rebalance
7.stop rebalance
8.brink down 2 bricks from each replicate set, so that one brick is online from each replica set
9.brick back bricks online
10.start force rebalance
11.query rebalance status 
12.stop rebalance

Repeat step8 to step12 3-4 times. 

Actual results:
rebalance process crashed.
Comment 1 shishir gowda 2012-04-05 03:48:29 EDT
Can you please provide the rebalance logs.
Also, a gdb o/p of the frame in question(5 i believe).
If possible, can the setup infomation be made available for me to access(can be mailed across).
Comment 2 Shwetha Panduranga 2012-04-05 08:02:01 EDT
Created attachment 575383 [details]
rebalance log
Comment 3 Shwetha Panduranga 2012-04-05 08:02:24 EDT
Able to recreate the problem with the above mentioned steps . Attaching the rebalance logs.
Comment 4 shishir gowda 2012-04-05 13:19:44 EDT
This seems to be a case where afr background self heal is in progress, and rebalance has called a cleanup_and_exit. Sending parent_down to xlators does not seem to be fixing this issue.
Comment 5 shishir gowda 2012-05-08 00:35:53 EDT
Closing this bug as we have switched off selfhealing from rebalance process. Please re-open the bug if you are able to reproduce it.

*** This bug has been marked as a duplicate of bug 808997 ***
Comment 6 shishir gowda 2012-05-08 00:36:50 EDT
sorry, marked it as dup to a wrong bug

*** This bug has been marked as a duplicate of bug 808977 ***
Comment 7 Shwetha Panduranga 2012-05-12 10:01:28 EDT
Unable to re-create the same issue.

Note You need to log in before you can comment on or make changes to this bug.