1098971 – [REBALANCE] glusterfs rebalance process core dumps while rebalancing deep directory structure

Bug 1098971 - [REBALANCE] glusterfs rebalance process core dumps while rebalancing deep directory structure

Summary: [REBALANCE] glusterfs rebalance process core dumps while rebalancing deep dir...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.0.0
Assignee:	vsomyaju
QA Contact:	Sachidananda Urs
Docs Contact:
URL:
Whiteboard:
Depends On:	1034108 1104653 1138922 1139984
Blocks:
TreeView+	depends on / blocked

Reported:	2014-05-19 09:21 UTC by Sachidananda Urs
Modified:	2015-05-13 16:59 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.6.0.18-1
Doc Type:	Bug Fix
Doc Text:	Cause: ------ Rebalance getting triggered even if gfid mismatch is found, and rebalance process crashes. Fix: ---- Due to race condition, it may so happen that, gfid obtained in readdirp and gfid found by lookup are different for a given name. in that case do no allow the rebalance. Readdirp of an entry will bring the gfid, which will be stored in the inode through inode_link, and when lookup is done and gfid brought by lookup is different from the one stored in the inode, client3_3_lookup_cbk will return ESATLE and error will be captured by rebalance process.
Clone Of:
Environment:
Last Closed:	2014-09-22 19:38:11 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Log and core files (4.33 MB, application/x-tar) 2014-05-19 09:21 UTC, Sachidananda Urs	no flags	Details
Complete backtrace (40.92 KB, text/plain) 2014-05-19 09:33 UTC, Sachidananda Urs	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2014:1278	0	normal	SHIPPED_LIVE	Red Hat Storage Server 3.0 bug fix and enhancement update	2014-09-22 23:26:55 UTC

Description Sachidananda Urs 2014-05-19 09:21:45 UTC

Created attachment 897058 [details]
Log and core files

Description of problem:

Start the rebalance process after removing existing brick and adding a new brick.
After around 30 minutes rebalance process crashes, and rebalance status is shown as `failed'

[root@g60ds-2 ~]#  gluster volume rebalance sixtydrive status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes        111468             0             0               failed            2153.00
                             172.17.69.1                0        0Bytes         97354             0         20149               failed            2162.00
volume rebalance: sixtydrive: success: 


Version-Release number of selected component (if applicable):

[root@g60ds-2 ~]# gluster --version
glusterfs 3.6.0.3 built on May 17 2014 10:49:46


How reproducible:

Always.

Steps to Reproduce:
1. Create huge amount of data
2. Remove brick (Migrates data)
3. Add brick and rebalance.

Actual results:

glusterfs rebalance process crashes.


Additional info:

Back trace:
(gdb) bt
#0  0x00007f6e0ef0222f in dht_layout_entry_cmp_volname (layout=0x7f6e04023ec0, i=0, j=<value optimized out>) at dht-layout.c:434
#1  0x00007f6e0ef0228d in dht_layout_sort_volname (layout=0x7f6e04023ec0) at dht-layout.c:506
#2  0x00007f6e0ef0b48b in dht_fix_layout_of_directory (frame=0x7f6e1c90c5ec, loc=0x7f6e0df97800, layout=0x14006d0) at dht-selfheal.c:776
#3  0x00007f6e0ef0cd59 in dht_fix_directory_layout (frame=<value optimized out>, dir_cbk=<value optimized out>, layout=0x14006d0)
    at dht-selfheal.c:915
#4  0x00007f6e0ef1ed82 in dht_setxattr (frame=0x7f6e1c90c5ec, this=0x13dded0, loc=0x7f6e0b386000, xattr=0x7f6e1c3060b4, flags=0, 
    xdata=0x0) at dht-common.c:2621
#5  0x00007f6e1db10761 in syncop_setxattr (subvol=0x13dded0, loc=0x7f6e0b386000, dict=0x7f6e1c3060b4, flags=0) at syncop.c:1314
#6  0x00007f6e0ef07ad1 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386220, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1575
#7  0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386440, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#8  0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386660, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#9  0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386880, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#10 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386aa0, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#11 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386cc0, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#12 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b386ee0, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#13 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387100, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#14 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387320, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#15 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387540, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#16 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387760, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#17 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387980, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#18 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387ba0, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#19 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387dc0, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#20 0x00007f6e0ef07af5 in gf_defrag_fix_layout (this=0x13dded0, defrag=0x13ff8e0, loc=0x7f6e0b387f60, fix_layout=0x7f6e1c3060b4, 
    migrate_data=0x7f6e1c306140) at dht-rebalance.c:1586
#21 0x00007f6e0ef08086 in gf_defrag_start_crawl (data=0x13dded0) at dht-rebalance.c:1705
#22 0x00007f6e1db0a5d2 in synctask_wrap (old_task=<value optimized out>) at syncop.c:333
#23 0x0000003558a43bf0 in ?? () from /lib64/libc-2.12.so
#24 0x0000000000000000 in ?? ()
(gdb) 


=================
Attached log file and core file.

Comment 1 Sachidananda Urs 2014-05-19 09:33:21 UTC

Created attachment 897061 [details]
Complete backtrace

Comment 13 Nagaprasad Sathyanarayana 2014-06-17 10:52:56 UTC

https://code.engineering.redhat.com/gerrit/#/c/26972/ & https://code.engineering.redhat.com/gerrit/#/c/27062/

Comment 14 Sachidananda Urs 2014-06-27 06:32:02 UTC

Verified on: glusterfs 3.6.0.22

Comment 16 errata-xmlrpc 2014-09-22 19:38:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html

Note You need to log in before you can comment on or make changes to this bug.