1269501 – Self-heal daemon crashes when bricks godown at the time of data heal

Bug 1269501 - Self-heal daemon crashes when bricks godown at the time of data heal

Summary: Self-heal daemon crashes when bricks godown at the time of data heal

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	3.7.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1269470 1276234
Blocks:	glusterfs-3.7.6
TreeView+	depends on / blocked

Reported:	2015-10-07 12:39 UTC by Pranith Kumar K
Modified:	2015-11-17 05:59 UTC (History)
CC List:	2 users (show)
Fixed In Version:	glusterfs-3.7.6
Clone Of:	1269470
Environment:
Last Closed:	2015-11-17 05:59:41 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Pranith Kumar K 2015-10-07 12:39:39 UTC

+++ This bug was initially created as a clone of Bug #1269470 +++

Description of problem:
When all the bricks go down at the time of data self-heal. Self-heal daemon process is crashing with following bt:
(gdb) bt
#0  0x00007fae978ccb0f in afr_local_replies_wipe (local=0x0, priv=0x7fae900125b0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c:1241
#1  0x00007fae978b7aaf in afr_selfheal_inodelk (frame=0x7fae8c000c0c, this=0x7fae9000a6d0, inode=0x7fae8c00609c, dom=0x7fae900099f0 "patchy-replicate-0", off=8126464, size=131072, locked_on=0x7fae96b4f110 "")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-common.c:879
#2  0x00007fae978bbeb5 in afr_selfheal_data_block (frame=0x7fae8c000c0c, this=0x7fae9000a6d0, fd=0x7fae8c006e6c, source=0, healed_sinks=0x7fae96b4f8a0 "", offset=8126464, size=131072, type=1, 
    replies=0x7fae96b4f2b0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:243
#3  0x00007fae978bc91d in afr_selfheal_data_do (frame=0x7fae8c006c9c, this=0x7fae9000a6d0, fd=0x7fae8c006e6c, source=0, healed_sinks=0x7fae96b4f8a0 "", replies=0x7fae96b4f2b0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:365
#4  0x00007fae978bdc7b in __afr_selfheal_data (frame=0x7fae8c006c9c, this=0x7fae9000a6d0, fd=0x7fae8c006e6c, locked_on=0x7fae96b4fa00 "\001\001\240")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:719
#5  0x00007fae978be0a0 in afr_selfheal_data (frame=0x7fae8c006c9c, this=0x7fae9000a6d0, inode=0x7fae8c00609c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:808
#6  0x00007fae978ba4d7 in afr_selfheal_do (frame=0x7fae8c006c9c, this=0x7fae9000a6d0, gfid=0x7fae96b4fc30 "s\303\315$w\244M\026\205`\226\336\263\205\300qЦ")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-common.c:1335
#7  0x00007fae978ba613 in afr_selfheal (this=0x7fae9000a6d0, gfid=0x7fae96b4fc30 "s\303\315$w\244M\026\205`\226\336\263\205\300qЦ")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-common.c:1380
#8  0x00007fae978c3e20 in afr_shd_selfheal (healer=0x7fae90013130, child=0, gfid=0x7fae96b4fc30 "s\303\315$w\244M\026\205`\226\336\263\205\300qЦ")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:326
#9  0x00007fae978c4142 in afr_shd_index_heal (subvol=0x7fae90006e50, entry=0x7fae90002900, parent=0x7fae96b4fdd0, data=0x7fae90013130)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:416
#10 0x00007faea482aa83 in syncop_dir_scan (subvol=0x7fae90006e50, loc=0x7fae96b4fdd0, pid=-6, data=0x7fae90013130, fn=0x7fae978c4034 <afr_shd_index_heal>)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop-utils.c:262
#11 0x00007fae978c42bb in afr_shd_index_sweep (healer=0x7fae90013130) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:450
#12 0x00007fae978c4553 in afr_shd_index_healer (data=0x7fae90013130) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:518
#13 0x00007faea3a90a51 in start_thread () from ./lib64/libpthread.so.0
#14 0x00007faea33fa93d in clone () from ./lib64/libc.so.6
(gdb) p local->child_up[0]
No symbol "local" in current context.
(gdb) p priv->child_up[0]
$8 = 0 '\000'
(gdb) p priv->child_up[1]
$9 = 0 '\000'

AFR_STACK_RESET() can fail to create local when the bricks are all down, which leads to the crash.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Pranith Kumar K 2015-10-29 07:33:50 UTC

http://review.gluster.org/12310

Comment 2 Raghavendra Talur 2015-11-17 05:59:41 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.6, please open a new bug report.

glusterfs-3.7.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/gluster-users/2015-November/024359.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.