Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 810103

Summary:

rebalance process crashed

Product:

[Community] GlusterFS

Reporter:

Shwetha Panduranga <shwetha.h.panduranga>

Component:

distribute

Assignee:

shishir gowda <sgowda>

Status:

CLOSED DUPLICATE

QA Contact:

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

mainline

CC:

gluster-bugs, nsathyan

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-05-08 04:35:53 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
rebalance log	none

Description Shwetha Panduranga 2012-04-05 07:25:33 UTC

Description of problem:
(gdb) bt full
#0  0x00000032f1e32885 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00000032f1e34065 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00000032f1e2b9fe in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x00000032f1e2bac0 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007fe315a0e6ab in __gf_free (free_ptr=0x1443400) at mem-pool.c:278
        req_size = 0
        ptr = 0x14433f4 ""
        type = 0
        xl = 0x0
        __PRETTY_FUNCTION__ = "__gf_free"
#5  0x00007fe310ff32e3 in gf_defrag_start_crawl (data=0x1425b00) at dht-rebalance.c:1485
        this = 0x1425b00
        conf = 0x14432f0
        defrag = 0x1443400
        ret = -1
        loc = {path = 0x7fe311032eaf "/", name = 0x0, inode = 0x7fe2d161a04c, parent = 0x0, gfid = '\000' <repeats 15 times>, "\001", 
          pargfid = '\000' <repeats 15 times>}
        iatt = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', 
            owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', 
              write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
          ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}
        parent = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', 
            owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', 
              write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
          ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}
        fix_layout = 0x0
        migrate_data = 0x0
        __FUNCTION__ = "gf_defrag_start_crawl"
#6  0x00007fe315a1f286 in synctask_wrap (old_task=0x7fe2c8200d70) at syncop.c:128
        task = 0x7fe2c8200d70
#7  0x00000032f1e43610 in ?? () from /lib64/libc.so.6
No symbol table info available.


Version-Release number of selected component (if applicable):
3.3.0qa33

gfsc1.sh:-
-----------
#!/bin/bash

mountpoint=`pwd`
for i in {1..10}
do
	level1_dir=$mountpoint/fuse2.$i
	mkdir $level1_dir
	cd $level1_dir
	for j in {1..20}
	do 
		level2_dir=dir.$j
		mkdir $level2_dir
		cd $level2_dir
		for k in {1..100}
		do 
			echo "Creating File: $leve1_dir/$level2_dir/file.$k"
			dd if=/dev/zero of=file.$k bs=1M count=$k 
		done
		cd $level1_dir
	done
	cd $mountpoint
done


nfsc1.sh:-
----------
#!/bin/bash

mountpoint=`pwd`
for i in {1..5}
do 
	level1_dir=$mountpoint/nfs2.$i
	mkdir $level1_dir
	cd $level1_dir
	for j in {1..20}
	do 
		level2_dir=dir.$j
		mkdir $level2_dir
		cd $level2_dir

		for k in {1..100}
		do 
			echo "Creating File: $leve1_dir/$level2_dir/file.$k"
			dd if=/dev/zero of=file.$k bs=1M count=$k
		
		done
		cd $level1_dir
	done
	cd $mountpoint
done

Steps to Reproduce:
1.create distribute-replicate volume(2X3). start the volume.
2.create fuse, nfs mounts. 
3.run gfsc1.sh from fuse mount
4.run nfsc1.sh from nfs mount
4.add-brick to the volume
5.start rebalance 
6.status rebalance
7.stop rebalance
8.brink down 2 bricks from each replicate set, so that one brick is online from each replica set
9.brick back bricks online
10.start force rebalance
11.query rebalance status 
12.stop rebalance

Repeat step8 to step12 3-4 times. 

Actual results:
rebalance process crashed.

Comment 1 shishir gowda 2012-04-05 07:48:29 UTC

Can you please provide the rebalance logs.
Also, a gdb o/p of the frame in question(5 i believe).
If possible, can the setup infomation be made available for me to access(can be mailed across).

Comment 2 Shwetha Panduranga 2012-04-05 12:02:01 UTC

Created attachment 575383 [details]
rebalance log

Comment 3 Shwetha Panduranga 2012-04-05 12:02:24 UTC

Able to recreate the problem with the above mentioned steps . Attaching the rebalance logs.

Comment 4 shishir gowda 2012-04-05 17:19:44 UTC

This seems to be a case where afr background self heal is in progress, and rebalance has called a cleanup_and_exit. Sending parent_down to xlators does not seem to be fixing this issue.

Comment 5 shishir gowda 2012-05-08 04:35:53 UTC

Closing this bug as we have switched off selfhealing from rebalance process. Please re-open the bug if you are able to reproduce it.

*** This bug has been marked as a duplicate of bug 808997 ***

Comment 6 shishir gowda 2012-05-08 04:36:50 UTC

sorry, marked it as dup to a wrong bug

*** This bug has been marked as a duplicate of bug 808977 ***

Comment 7 Shwetha Panduranga 2012-05-12 14:01:28 UTC

Unable to re-create the same issue.