Bug 1206655

Summary: glusterd crashes on brick op
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: amukherj, bugs, gluster-bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: GlusterD
Fixed In Version: glusterfs-3.7.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-05-14 17:29:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
glusterd core none

Description Ravishankar N 2015-03-27 16:16:55 UTC
Description of problem:
Glusterd crashes 

Version-Release number of selected component (if applicable):
Upstream master,HEAD @ 
28c446a4eaaecfba8115dbaf66b3820b9d53257b: features/snapview-client: Don't free un-allocated memory (4 hours ago) <Pranith Kumar K>


How reproducible:
Always

Steps to Reproduce:
1.gluster v create testvol replica 2 127.0.0.2:/home/ravi/bricks/brick{1..2} force
2.gluster v start testvol
3.[root@tuxpad review-glusterfs]# gluster v heal testvol
Connection failed. Please check if gluster daemon is operational

Actual results:
glusterd crashes.

Additional info:
#gdb glusterd core.10976 
(gdb) bt
#0  0x00007f551882b8c7 in raise () from /lib64/libc.so.6
#1  0x00007f551882d52a in abort () from /lib64/libc.so.6
#2  0x00007f551882446d in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f5518824522 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f550f4721ff in glusterd_volume_heal_use_rsp_dict (aggr=0x7f54f402c52c, rsp_dict=0x7f54f40028cc) at glusterd-utils.c:8312
#5  0x00007f550f4d58fe in glusterd_syncop_aggr_rsp_dict (op=GD_OP_HEAL_VOLUME, aggr=0x7f54f402c52c, rsp=0x7f54f40028cc)
    at glusterd-syncop.c:280
#6  0x00007f550f4d8d56 in gd_brick_op_phase (op=GD_OP_HEAL_VOLUME, op_ctx=0x7f54f402c52c, req_dict=0x7f54f40035ec, 
    op_errstr=0x7f5500406b18) at glusterd-syncop.c:1542
#7  0x00007f550f4d9501 in gd_sync_task_begin (op_ctx=0x7f54f402c52c, req=0x7f55000037cc) at glusterd-syncop.c:1723
#8  0x00007f550f4d970c in glusterd_op_begin_synctask (req=0x7f55000037cc, op=GD_OP_HEAL_VOLUME, dict=0x7f54f402c52c)
    at glusterd-syncop.c:1782
#9  0x00007f550f4c5385 in __glusterd_handle_cli_heal_volume (req=0x7f55000037cc) at glusterd-volume-ops.c:782
#10 0x00007f550f4375cd in glusterd_big_locked_handler (req=0x7f55000037cc, 
    actor_fn=0x7f550f4c5004 <__glusterd_handle_cli_heal_volume>) at glusterd-handler.c:82
#11 0x00007f550f4c545e in glusterd_handle_cli_heal_volume (req=0x7f55000037cc) at glusterd-volume-ops.c:800
#12 0x00007f5519c94691 in synctask_wrap (old_task=0x7f55000042d0) at syncop.c:375
#13 0x00007f551883eff0 in ?? () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()

(gdb) f 4
#4  0x00007f550f4721ff in glusterd_volume_heal_use_rsp_dict (aggr=0x7f54f402c52c, rsp_dict=0x7f54f40028cc) at glusterd-utils.c:8312
8312            GF_ASSERT (GD_OP_HEAL_VOLUME == op);
(gdb) l
8307            glusterd_op_t  op       = GD_OP_NONE;
8308
8309            GF_ASSERT (rsp_dict);
8310
8311            op = glusterd_op_get_op ();
8312            GF_ASSERT (GD_OP_HEAL_VOLUME == op);
8313
8314            if (aggr) {
8315                    ctx_dict = aggr;
8316
(gdb) p op
$1 = GD_OP_NONE
(gdb)

Comment 1 Ravishankar N 2015-03-27 16:19:56 UTC
Created attachment 1007370 [details]
glusterd core

Comment 2 Ravishankar N 2015-03-27 16:24:39 UTC
Seems to have been caused by http://review.gluster.org/#/c/9908/
Reverted it and things seem to work fine.

Comment 3 Anand Avati 2015-03-29 14:32:44 UTC
REVIEW: http://review.gluster.org/10034 (glusterd: Use txn_opinfo instead of global op_info in glusterd_volume_heal_use_rsp_dict) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 4 Anand Avati 2015-03-30 02:46:04 UTC
COMMIT: http://review.gluster.org/10034 committed in master by Krishnan Parthasarathi (kparthas) 
------
commit 3b647e10124cfc22983b11bf9bfaa36289a2a42f
Author: Atin Mukherjee <amukherj>
Date:   Sun Mar 29 19:59:19 2015 +0530

    glusterd: Use txn_opinfo instead of global op_info in glusterd_volume_heal_use_rsp_dict
    
    Due to http://review.gluster.org/#/c/9908/ global opinfo is no more a valid
    place holder for keeping transaction information for syncop task.
    glusterd_volume_heal_use_rsp_dict () was referring to global op_info due to
    which the function was always asserting on the op code.
    
    Change-Id: I1d416fe4edb40962fe7a0f6ecf541602debac56e
    BUG: 1206655
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/10034
    Reviewed-by: Emmanuel Dreyfus <manu>
    Tested-by: Emmanuel Dreyfus <manu>
    Reviewed-by: Venky Shankar <vshankar>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Tested-by: Krishnan Parthasarathi <kparthas>

Comment 5 Niels de Vos 2015-05-14 17:29:25 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:35:55 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:38:16 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 8 Niels de Vos 2015-05-14 17:46:38 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user