Bug 1261764 - Glusterd crashed with no IO and enabling / disabling heals on ec volume
Summary: Glusterd crashed with no IO and enabling / disabling heals on ec volume
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Ashish Pandey
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-10 06:47 UTC by Bhaskarakiran
Modified: 2017-02-08 13:33 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-08 13:33:27 UTC
Embargoed:


Attachments (Terms of Use)
core file (14.85 MB, application/zip)
2015-09-10 06:47 UTC, Bhaskarakiran
no flags Details

Description Bhaskarakiran 2015-09-10 06:47:47 UTC
Created attachment 1072032 [details]
core file

Description of problem:
=======================

Seen a glusterd crash while enabling / disabling heal on an ec volume. There's no IO from the client but heals are running in the background.

Backtrace:
==========

(gdb) bt
#0  rpc_transport_submit_request (this=0x7f8a144b4fc0, req=0x7f8a18833d90) at rpc-transport.c:399
#1  0x00007f8a30d709e9 in rpcsvc_callback_submit (rpc=<optimized out>, trans=trans@entry=0x7f8a144b4fc0, prog=prog@entry=0x7f8a25d95410 <glusterd_cbk_prog>, 
    procnum=procnum@entry=1, proghdr=proghdr@entry=0x0, proghdrcount=proghdrcount@entry=0) at rpcsvc.c:1080
#2  0x00007f8a25a47f8d in glusterd_fetchspec_notify (this=<optimized out>) at glusterd.c:247
#3  0x00007f8a25ac6fde in glusterd_create_volfiles_and_notify_services (volinfo=<optimized out>) at glusterd-volgen.c:5464
#4  0x00007f8a25a79cec in glusterd_op_set_volume (errstr=0x7f8a18834880, dict=0x7f8a0b8e80ac) at glusterd-op-sm.c:2598
#5  glusterd_op_commit_perform (op=op@entry=GD_OP_SET_VOLUME, dict=dict@entry=0x7f8a0b8e80ac, op_errstr=op_errstr@entry=0x7f8a18834880, 
    rsp_dict=rsp_dict@entry=0x7f8a0b4d229c) at glusterd-op-sm.c:5530
#6  0x00007f8a25b00a09 in gd_commit_op_phase (op=GD_OP_SET_VOLUME, op_ctx=op_ctx@entry=0x7f8a0b8e80ac, req_dict=0x7f8a0b8e80ac, 
    op_errstr=op_errstr@entry=0x7f8a18834880, txn_opinfo=txn_opinfo@entry=0x7f8a188348a0) at glusterd-syncop.c:1365
#7  0x00007f8a25b02139 in gd_sync_task_begin (op_ctx=op_ctx@entry=0x7f8a0b8e80ac, req=req@entry=0x7f8a32b6501c) at glusterd-syncop.c:1882
#8  0x00007f8a25b022b0 in glusterd_op_begin_synctask (req=req@entry=0x7f8a32b6501c, op=op@entry=GD_OP_SET_VOLUME, dict=dict@entry=0x7f8a0b8e80ac)
    at glusterd-syncop.c:1945
#9  0x00007f8a25aec20e in glusterd_handle_heal_enable_disable (volinfo=<optimized out>, dict=0x7f8a0b8e80ac, req=0x7f8a32b6501c) at glusterd-volume-ops.c:732
#10 __glusterd_handle_cli_heal_volume (req=req@entry=0x7f8a32b6501c) at glusterd-volume-ops.c:802
#11 0x00007f8a25a62c00 in glusterd_big_locked_handler (req=0x7f8a32b6501c, actor_fn=0x7f8a25aebd60 <__glusterd_handle_cli_heal_volume>)
    at glusterd-handler.c:83
#12 0x00007f8a30fed102 in synctask_wrap (old_task=<optimized out>) at syncop.c:381
#13 0x00007f8a2f6ab0f0 in ?? () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()
(gdb) 


Version-Release number of selected component (if applicable):
=============================================================
3.7.1-14


How reproducible:
=================
Every time heal enable / disable are done, there's a crash


Steps to Reproduce:
1. Create a 8+4 ec volume.
2. Fuse mount on the client and create data set of ~ 250GB.
3. Bring down 2 of the bricks while IO is in progress
4. Let the IO complete. Bring back the bricks and trigger heal full 
5. Enable / disable heal in a loop at intervals of 1 minute

Actual results:
===============
Glusterd crash

Expected results:
=================
No crashes.

Additional info:
================
Attaching the core file.

Comment 4 Atin Mukherjee 2017-02-08 13:33:27 UTC
This crash was observed when ping time out was enabled for GlusterD to GlusterD communication. We don't have any future plan to enable this option back and hence closing this bug.


Note You need to log in before you can comment on or make changes to this bug.