Bug 1269501 - Self-heal daemon crashes when bricks godown at the time of data heal
Self-heal daemon crashes when bricks godown at the time of data heal
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.7.5
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Pranith Kumar K
:
Depends On: 1269470 1276234
Blocks: glusterfs-3.7.6
  Show dependency treegraph
 
Reported: 2015-10-07 08:39 EDT by Pranith Kumar K
Modified: 2015-11-17 00:59 EST (History)
2 users (show)

See Also:
Fixed In Version: glusterfs-3.7.6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1269470
Environment:
Last Closed: 2015-11-17 00:59:41 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pranith Kumar K 2015-10-07 08:39:39 EDT
+++ This bug was initially created as a clone of Bug #1269470 +++

Description of problem:
When all the bricks go down at the time of data self-heal. Self-heal daemon process is crashing with following bt:
(gdb) bt
#0  0x00007fae978ccb0f in afr_local_replies_wipe (local=0x0, priv=0x7fae900125b0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c:1241
#1  0x00007fae978b7aaf in afr_selfheal_inodelk (frame=0x7fae8c000c0c, this=0x7fae9000a6d0, inode=0x7fae8c00609c, dom=0x7fae900099f0 "patchy-replicate-0", off=8126464, size=131072, locked_on=0x7fae96b4f110 "")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-common.c:879
#2  0x00007fae978bbeb5 in afr_selfheal_data_block (frame=0x7fae8c000c0c, this=0x7fae9000a6d0, fd=0x7fae8c006e6c, source=0, healed_sinks=0x7fae96b4f8a0 "", offset=8126464, size=131072, type=1, 
    replies=0x7fae96b4f2b0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:243
#3  0x00007fae978bc91d in afr_selfheal_data_do (frame=0x7fae8c006c9c, this=0x7fae9000a6d0, fd=0x7fae8c006e6c, source=0, healed_sinks=0x7fae96b4f8a0 "", replies=0x7fae96b4f2b0)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:365
#4  0x00007fae978bdc7b in __afr_selfheal_data (frame=0x7fae8c006c9c, this=0x7fae9000a6d0, fd=0x7fae8c006e6c, locked_on=0x7fae96b4fa00 "\001\001\240")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:719
#5  0x00007fae978be0a0 in afr_selfheal_data (frame=0x7fae8c006c9c, this=0x7fae9000a6d0, inode=0x7fae8c00609c)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:808
#6  0x00007fae978ba4d7 in afr_selfheal_do (frame=0x7fae8c006c9c, this=0x7fae9000a6d0, gfid=0x7fae96b4fc30 "s\303\315$w\244M\026\205`\226\336\263\205\300qЦ")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-common.c:1335
#7  0x00007fae978ba613 in afr_selfheal (this=0x7fae9000a6d0, gfid=0x7fae96b4fc30 "s\303\315$w\244M\026\205`\226\336\263\205\300qЦ")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-common.c:1380
#8  0x00007fae978c3e20 in afr_shd_selfheal (healer=0x7fae90013130, child=0, gfid=0x7fae96b4fc30 "s\303\315$w\244M\026\205`\226\336\263\205\300qЦ")
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:326
#9  0x00007fae978c4142 in afr_shd_index_heal (subvol=0x7fae90006e50, entry=0x7fae90002900, parent=0x7fae96b4fdd0, data=0x7fae90013130)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:416
#10 0x00007faea482aa83 in syncop_dir_scan (subvol=0x7fae90006e50, loc=0x7fae96b4fdd0, pid=-6, data=0x7fae90013130, fn=0x7fae978c4034 <afr_shd_index_heal>)
    at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop-utils.c:262
#11 0x00007fae978c42bb in afr_shd_index_sweep (healer=0x7fae90013130) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:450
#12 0x00007fae978c4553 in afr_shd_index_healer (data=0x7fae90013130) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heald.c:518
#13 0x00007faea3a90a51 in start_thread () from ./lib64/libpthread.so.0
#14 0x00007faea33fa93d in clone () from ./lib64/libc.so.6
(gdb) p local->child_up[0]
No symbol "local" in current context.
(gdb) p priv->child_up[0]
$8 = 0 '\000'
(gdb) p priv->child_up[1]
$9 = 0 '\000'

AFR_STACK_RESET() can fail to create local when the bricks are all down, which leads to the crash.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Pranith Kumar K 2015-10-29 03:33:50 EDT
http://review.gluster.org/12310
Comment 2 Raghavendra Talur 2015-11-17 00:59:41 EST
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.6, please open a new bug report.

glusterfs-3.7.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/gluster-users/2015-November/024359.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.