1452956 – glusterd on a node crashed after running volume profile command

Bug 1452956 - glusterd on a node crashed after running volume profile command

Summary: glusterd on a node crashed after running volume profile command

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Gaurav Yadav
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1452205 1454612
TreeView+	depends on / blocked

Reported:	2017-05-21 05:28 UTC by Gaurav Yadav
Modified:	2017-09-05 17:30 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.12.0
Clone Of:	1452205
Environment:
Last Closed:	2017-09-05 17:30:44 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Gaurav Yadav 2017-05-21 05:28:47 UTC

+++ This bug was initially created as a clone of Bug #1452205 +++

Description of problem:
=======================
glusterd on a node crashed after running volume profile command.

Version-Release number of selected component (if applicable):
3.8.4-24.el7rhgs

How reproducible:
1/1

Steps to Reproduce:
===================
1) Create two gluster volumes and start them.
2) Enable brick mux on the volumes.
3) start volume profile on the two volumes in a single command as below,

glusterd crashes on the node where command is executed,

gluster v profile <vol1> info | gluster v profile <vol2> info

[2017-05-15 10:17:50.465730] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: scrub service is stopped
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2017-05-15 10:18:19
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8.4
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f79eca6a0e2]
/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f79eca73b04]
/lib64/libc.so.6(+0x35250)[0x7f79eb14c250]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x42120)[0x7f79e15b0120]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x3e40f)[0x7f79e15ac40f]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x6dcb5)[0x7f79e15dbcb5]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x6ee6a)[0x7f79e15dce6a]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f79ec833840]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1e7)[0x7f79ec833b27]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f79ec82f9e3]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x73b4)[0x7f79de9fc3b4]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x9895)[0x7f79de9fe895]
/lib64/libglusterfs.so.0(+0x83e00)[0x7f79ecac3e00]
/lib64/libpthread.so.0(+0x7dc5)[0x7f79eb8c9dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f79eb20e73d]
---------

(gdb) bt 
#0  glusterd_op_ac_rcvd_brick_op_acc (event=0x7f79d04449a0, ctx=0x0) at glusterd-op-sm.c:7544
#1  0x00007f79e15ac40f in glusterd_op_sm () at glusterd-op-sm.c:8091
#2  0x00007f79e15dbcb5 in __glusterd_stage_op_cbk (req=req@entry=0x7f79d006aaa0, 
    iov=iov@entry=0x7f79d006aae0, count=count@entry=1, myframe=myframe@entry=0x7f79d0429580)
    at glusterd-rpc-ops.c:1279
#3  0x00007f79e15dce6a in glusterd_big_locked_cbk (req=0x7f79d006aaa0, iov=0x7f79d006aae0, 
    count=1, myframe=0x7f79d0429580, fn=0x7f79e15db790 <__glusterd_stage_op_cbk>)
    at glusterd-rpc-ops.c:215
#4  0x00007f79ec833840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f79ee704e00, 
    pollin=pollin@entry=0x7f79d0427fe0) at rpc-clnt.c:794
#5  0x00007f79ec833b27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f79ee704e30, 
    event=<optimized out>, data=0x7f79d0427fe0) at rpc-clnt.c:987
#6  0x00007f79ec82f9e3 in rpc_transport_notify (this=this@entry=0x7f79ee705000, 
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f79d0427fe0)
    at rpc-transport.c:538
#7  0x00007f79de9fc3b4 in socket_event_poll_in (this=this@entry=0x7f79ee705000)
    at socket.c:2275
#8  0x00007f79de9fe895 in socket_event_handler (fd=<optimized out>, idx=2, 
    data=0x7f79ee705000, poll_in=1, poll_out=0, poll_err=0) at socket.c:2411
#9  0x00007f79ecac3e00 in event_dispatch_epoll_handler (event=0x7f79dca4fe80, 
    event_pool=0x7f79ee5dd730) at event-epoll.c:572
#10 event_dispatch_epoll_worker (data=0x7f79ee6364d0) at event-epoll.c:675
#11 0x00007f79eb8c9dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f79eb20e73d in clone () from /lib64/libc.so.6

Actual results:
===============
Glusterd crashed

Expected results:
=================
No crashes.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-18 10:37:45 EDT ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.3.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Prasad Desala on 2017-05-18 10:45:30 EDT ---

sosreports and core@ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/Prasad/1452205/

[root@dhcp43-49 /]# gluster v status
Status of volume: distrep
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.43.49:/bricks/brick5/b5         49152     0          Y       26443
Brick 10.70.43.41:/bricks/brick5/b5         49152     0          Y       18144
Brick 10.70.43.35:/bricks/brick5/b5         49152     0          Y       19343
Brick 10.70.43.37:/bricks/brick5/b5         49152     0          Y       18841
Brick 10.70.43.31:/bricks/brick5/b5         49152     0          Y       19375
Brick 10.70.43.27:/bricks/brick5/b5         49152     0          Y       18892
Brick 10.70.43.49:/bricks/brick6/b6         49153     0          Y       26451
Brick 10.70.43.41:/bricks/brick6/b6         49153     0          Y       18152
Brick 10.70.43.35:/bricks/brick6/b6         49153     0          Y       19351
Brick 10.70.43.37:/bricks/brick6/b6         49153     0          Y       18849
Brick 10.70.43.31:/bricks/brick6/b6         49152     0          Y       19383
Brick 10.70.43.27:/bricks/brick6/b6         49153     0          Y       18900
Brick 10.70.43.49:/bricks/brick7/b7         49154     0          Y       26454
Brick 10.70.43.41:/bricks/brick7/b7         49155     0          Y       18162
Brick 10.70.43.35:/bricks/brick7/b7         49154     0          Y       19359
Brick 10.70.43.37:/bricks/brick7/b7         49154     0          Y       18857
Brick 10.70.43.31:/bricks/brick7/b7         49152     0          Y       19390
Brick 10.70.43.27:/bricks/brick7/b7         N/A       N/A        N       N/A  
Brick 10.70.43.49:/bricks/brick8/b8         49155     0          Y       26460
Brick 10.70.43.41:/bricks/brick8/b8         49155     0          Y       18162
Brick 10.70.43.35:/bricks/brick8/b8         49155     0          Y       19366
Brick 10.70.43.37:/bricks/brick8/b8         49155     0          Y       18860
Brick 10.70.43.31:/bricks/brick8/b8         49154     0          Y       19390
Brick 10.70.43.27:/bricks/brick8/b8         49155     0          Y       18917
Self-heal Daemon on localhost               N/A       N/A        Y       27433
Self-heal Daemon on 10.70.43.37             N/A       N/A        Y       19589
Self-heal Daemon on 10.70.43.27             N/A       N/A        Y       19611
Self-heal Daemon on 10.70.43.41             N/A       N/A        Y       19024
Self-heal Daemon on 10.70.43.35             N/A       N/A        Y       20099
Self-heal Daemon on 10.70.43.31             N/A       N/A        Y       23769
 
Task Status of Volume distrep
------------------------------------------------------------------------------
Task                 : Remove brick        
ID                   : ebbbd0db-983b-4294-be2d-8a7810feb96f
Removed bricks:     
10.70.43.49:/bricks/brick5/b5
10.70.43.41:/bricks/brick5/b5
10.70.43.35:/bricks/brick5/b5
10.70.43.37:/bricks/brick5/b5
10.70.43.31:/bricks/brick5/b5
10.70.43.27:/bricks/brick5/b5
Status               : failed              
 
Status of volume: distrep_3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.43.49:/bricks/brick0/b0         49156     0          Y       26479
Brick 10.70.43.41:/bricks/brick0/b0         49156     0          Y       18175
Brick 10.70.43.35:/bricks/brick0/b0         49156     0          Y       19375
Brick 10.70.43.37:/bricks/brick0/b0         N/A       N/A        N       N/A  
Brick 10.70.43.31:/bricks/brick0/b0         49152     0          Y       19490
Brick 10.70.43.27:/bricks/brick0/b0         N/A       N/A        N       N/A  
Brick 10.70.43.49:/bricks/brick1/b1         49157     0          Y       26487
Brick 10.70.43.41:/bricks/brick1/b1         49156     0          Y       18175
Brick 10.70.43.35:/bricks/brick1/b1         49157     0          Y       19386
Brick 10.70.43.37:/bricks/brick1/b1         49157     0          Y       18876
Brick 10.70.43.31:/bricks/brick1/b1         49152     0          Y       19501
Brick 10.70.43.27:/bricks/brick1/b1         49157     0          Y       18932
Self-heal Daemon on localhost               N/A       N/A        Y       27433
Self-heal Daemon on 10.70.43.41             N/A       N/A        Y       19024
Self-heal Daemon on 10.70.43.37             N/A       N/A        Y       19589
Self-heal Daemon on 10.70.43.35             N/A       N/A        Y       20099
Self-heal Daemon on 10.70.43.27             N/A       N/A        Y       19611
Self-heal Daemon on 10.70.43.31             N/A       N/A        Y       23769
 
Task Status of Volume distrep_3
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: new
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.43.49:/bricks/brick2/b2         49158     0          Y       26493
Brick 10.70.43.41:/bricks/brick2/b2         49157     0          Y       18313
Brick 10.70.43.35:/bricks/brick2/b2         49158     0          Y       19393
Brick 10.70.43.37:/bricks/brick2/b2         49158     0          Y       18889
Brick 10.70.43.31:/bricks/brick2/b2         49155     0          Y       19509
Brick 10.70.43.27:/bricks/brick2/b2         49158     0          Y       18942
 
Task Status of Volume new
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.43.49:/bricks/brick3/b3         49159     0          Y       26745
Brick 10.70.43.41:/bricks/brick3/b3         49158     0          Y       18476
Brick 10.70.43.35:/bricks/brick3/b3         49159     0          Y       19588
Brick 10.70.43.37:/bricks/brick3/b3         49159     0          Y       19081
Brick 10.70.43.31:/bricks/brick3/b3         49159     0          Y       20152
Brick 10.70.43.27:/bricks/brick3/b3         49160     0          Y       19591
Snapshot Daemon on localhost                49160     0          Y       27359
Self-heal Daemon on localhost               N/A       N/A        Y       27433
Quota Daemon on localhost                   N/A       N/A        Y       27442
Snapshot Daemon on 10.70.43.37              49160     0          Y       19531
Self-heal Daemon on 10.70.43.37             N/A       N/A        Y       19589
Quota Daemon on 10.70.43.37                 N/A       N/A        Y       19599
Snapshot Daemon on 10.70.43.41              49159     0          Y       18966
Self-heal Daemon on 10.70.43.41             N/A       N/A        Y       19024
Quota Daemon on 10.70.43.41                 N/A       N/A        Y       19033
Snapshot Daemon on 10.70.43.35              49160     0          Y       20049
Self-heal Daemon on 10.70.43.35             N/A       N/A        Y       20099
Quota Daemon on 10.70.43.35                 N/A       N/A        Y       20108
Snapshot Daemon on 10.70.43.27              49159     0          Y       19541
Self-heal Daemon on 10.70.43.27             N/A       N/A        Y       19611
Quota Daemon on 10.70.43.27                 N/A       N/A        Y       19620
Snapshot Daemon on 10.70.43.31              49158     0          Y       20098
Self-heal Daemon on 10.70.43.31             N/A       N/A        Y       23769
Quota Daemon on 10.70.43.31                 N/A       N/A        Y       23779
 
Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks


 
[root@dhcp43-49 /]# gluster v info
 
Volume Name: distrep
Type: Distributed-Replicate
Volume ID: 78e69a54-88d9-4b21-b6a0-3bc412849f80
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x 2 = 24
Transport-type: tcp
Bricks:
Brick1: 10.70.43.49:/bricks/brick5/b5
Brick2: 10.70.43.41:/bricks/brick5/b5
Brick3: 10.70.43.35:/bricks/brick5/b5
Brick4: 10.70.43.37:/bricks/brick5/b5
Brick5: 10.70.43.31:/bricks/brick5/b5
Brick6: 10.70.43.27:/bricks/brick5/b5
Brick7: 10.70.43.49:/bricks/brick6/b6
Brick8: 10.70.43.41:/bricks/brick6/b6
Brick9: 10.70.43.35:/bricks/brick6/b6
Brick10: 10.70.43.37:/bricks/brick6/b6
Brick11: 10.70.43.31:/bricks/brick6/b6
Brick12: 10.70.43.27:/bricks/brick6/b6
Brick13: 10.70.43.49:/bricks/brick7/b7
Brick14: 10.70.43.41:/bricks/brick7/b7
Brick15: 10.70.43.35:/bricks/brick7/b7
Brick16: 10.70.43.37:/bricks/brick7/b7
Brick17: 10.70.43.31:/bricks/brick7/b7
Brick18: 10.70.43.27:/bricks/brick7/b7
Brick19: 10.70.43.49:/bricks/brick8/b8
Brick20: 10.70.43.41:/bricks/brick8/b8
Brick21: 10.70.43.35:/bricks/brick8/b8
Brick22: 10.70.43.37:/bricks/brick8/b8
Brick23: 10.70.43.31:/bricks/brick8/b8
Brick24: 10.70.43.27:/bricks/brick8/b8
Options Reconfigured:
storage.batch-fsync-delay-usec: 0
server.allow-insecure: on
nfs.disable: on
transport.address-family: inet
performance.nl-cache: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.brick-multiplex: enable
 
Volume Name: distrep_3
Type: Distributed-Replicate
Volume ID: b5a46a4e-81ff-4dae-9f3a-711876ea4fba
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.43.49:/bricks/brick0/b0
Brick2: 10.70.43.41:/bricks/brick0/b0
Brick3: 10.70.43.35:/bricks/brick0/b0
Brick4: 10.70.43.37:/bricks/brick0/b0
Brick5: 10.70.43.31:/bricks/brick0/b0
Brick6: 10.70.43.27:/bricks/brick0/b0
Brick7: 10.70.43.49:/bricks/brick1/b1
Brick8: 10.70.43.41:/bricks/brick1/b1
Brick9: 10.70.43.35:/bricks/brick1/b1
Brick10: 10.70.43.37:/bricks/brick1/b1
Brick11: 10.70.43.31:/bricks/brick1/b1
Brick12: 10.70.43.27:/bricks/brick1/b1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.brick-multiplex: enable
 
Volume Name: new
Type: Distributed-Replicate
Volume ID: ee1f5d0a-e6dd-4ece-928b-4efbf27377f6
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.43.49:/bricks/brick2/b2
Brick2: 10.70.43.41:/bricks/brick2/b2
Brick3: 10.70.43.35:/bricks/brick2/b2
Brick4: 10.70.43.37:/bricks/brick2/b2
Brick5: 10.70.43.31:/bricks/brick2/b2
Brick6: 10.70.43.27:/bricks/brick2/b2
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
cluster.self-heal-daemon: off
cluster.brick-multiplex: enable
 
Volume Name: test
Type: Distributed-Replicate
Volume ID: a8940f7f-6af9-4e97-bd02-2335340fd2fd
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.43.49:/bricks/brick3/b3
Brick2: 10.70.43.41:/bricks/brick3/b3
Brick3: 10.70.43.35:/bricks/brick3/b3
Brick4: 10.70.43.37:/bricks/brick3/b3
Brick5: 10.70.43.31:/bricks/brick3/b3
Brick6: 10.70.43.27:/bricks/brick3/b3
Options Reconfigured:
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-19 04:30:37 EDT ---

This bug is automatically being provided 'pm_ack+' for the release flag 'rhgs‑3.3.0', the current release of Red Hat Gluster Storage 3 under active development, having been appropriately marked for the release, and having been provided ACK from Development and QE

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-19 07:27:12 EDT ---

Since this bug has been approved for the RHGS 3.3.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.3.0+', and through the Internal Whiteboard entry of '3.3.0', the Target Release is being automatically set to 'RHGS 3.3.0'

Comment 1 Worker Ant 2017-05-22 05:19:39 UTC

REVIEW: https://review.gluster.org/17350 (glusterd : volume profile command on one of the node crashes glusterd) posted (#1) for review on master by Gaurav Yadav (gyadav)

Comment 2 Worker Ant 2017-05-22 11:31:57 UTC

REVIEW: https://review.gluster.org/17350 (glusterd : volume profile command on one of the node crashes glusterd) posted (#2) for review on master by Gaurav Yadav (gyadav)

Comment 3 Worker Ant 2017-05-23 04:40:13 UTC

COMMIT: https://review.gluster.org/17350 committed in master by Atin Mukherjee (amukherj) 
------
commit 8dc63c8824fc1a00c873c16e8a16a14fca7c8cca
Author: Gaurav Yadav <gyadav>
Date:   Sun May 21 12:31:29 2017 +0530

    glusterd : volume profile command on one of the node crashes glusterd
    
    When volume profile command is issued on one of the node glusterd
    crashes. Its a race condition which may hit when profile command and
    status command is being executed from node A and node B respectively.
    While doing so event GD_OP_STATE_BRICK_OP_SENT/GD_OP_STATE_BRICK_COMMITTED
    is being triggered. As handling of event is not thread safe, hence context
    got modify and glusterd crashes.
    
    With the fix now we are validating the context before using it.
    
    Change-Id: Ic07c3cdc5644677b0e40ff0fac6fcca834158913
    BUG: 1452956
    Signed-off-by: Gaurav Yadav <gyadav>
    Reviewed-on: https://review.gluster.org/17350
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Samikshan Bairagya <samikshan>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 4 Worker Ant 2017-06-06 12:23:02 UTC

REVIEW: https://review.gluster.org/17478 (glusterd: fix glusterd crash from glusterd_op_ac_rcvd_brick_op_acc) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 5 Worker Ant 2017-06-07 05:48:05 UTC

COMMIT: https://review.gluster.org/17478 committed in master by Atin Mukherjee (amukherj) 
------
commit bae51359b4a3a7a9c16424b43eb5ad14f0fcad53
Author: Atin Mukherjee <amukherj>
Date:   Tue Jun 6 17:45:51 2017 +0530

    glusterd: fix glusterd crash from glusterd_op_ac_rcvd_brick_op_acc
    
    In out label, before checking ev_ctx->rsp_dict we should first check if
    ev_ctx is not NULL
    
    Change-Id: I28f4f1ee9070617a0e6a23a43af8c5756f96a47e
    BUG: 1452956
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/17478
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Gaurav Yadav <gyadav>
    Reviewed-by: Samikshan Bairagya <samikshan>

Comment 6 Qarion 2017-06-23 09:50:35 UTC

had a very similar crash after running profile start followed by profile info(~2 min later) both on same server.
[2017-06-21 09:41:24.109867]  : volume profile gluster start : SUCCESS
[2017-06-21 09:42:54.713030]  : volume profile gluster info : FAILED : error

3 node replica with 1 volume (all 3 nodes are clients and servers)
brick on node 1 crashed right after the profile info.
brick on node 2 followed on 2017-06-21 09:44:07 with exact same symptoms/log entry without having had any command run at that moment, causing the clients to become read-only.

pending frames:
frame : type(0) op(27)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2017-06-21 09:44:07
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.10.1
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xaa)[0x7f2b0dcefbda]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x324)[0x7f2b0dcf9294]
/lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f2b0d0e54b0]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_lookup+0x1b)[0x7f2b0dd6072b]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/debug/io-stats.so(+0x4533)[0x7f2b01354533]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.so(+0x2d19f)[0x7f2b0113319f]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.so(+0xb746)[0x7f2b01111746]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.so(+0xb7d5)[0x7f2b011117d5]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.so(+0xc14c)[0x7f2b0111214c]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.so(+0xb81e)[0x7f2b0111181e]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.so(+0xbeeb)[0x7f2b01111eeb]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.so(+0xc178)[0x7f2b01112178]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.so(+0xb7fe)[0x7f2b011117fe]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.so(+0xc1f4)[0x7f2b011121f4]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.so(+0x2d379)[0x7f2b01133379]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325)[0x7f2b0dabb2a5]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpcsvc_notify+0x17e)[0x7f2b0dabb51e]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f2b0dabd403]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/rpc-transport/socket.so(+0x6b66)[0x7f2b08e55b66]
/usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/rpc-transport/socket.so(+0x8d4f)[0x7f2b08e57d4f]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7663a)[0x7f2b0dd4363a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f2b0d4806ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f2b0d1b682d]
---------

Comment 7 Atin Mukherjee 2017-06-23 10:59:58 UTC

(In reply to Qarion from comment #6)
> had a very similar crash after running profile start followed by profile
> info(~2 min later) both on same server.
> [2017-06-21 09:41:24.109867]  : volume profile gluster start : SUCCESS
> [2017-06-21 09:42:54.713030]  : volume profile gluster info : FAILED : error
> 
> 3 node replica with 1 volume (all 3 nodes are clients and servers)
> brick on node 1 crashed right after the profile info.
> brick on node 2 followed on 2017-06-21 09:44:07 with exact same symptoms/log
> entry without having had any command run at that moment, causing the clients
> to become read-only.
> 
> pending frames:
> frame : type(0) op(27)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 11
> time of crash: 
> 2017-06-21 09:44:07
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.10.1
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.
> 0(_gf_msg_backtrace_nomem+0xaa)[0x7f2b0dcefbda]
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.
> 0(gf_print_trace+0x324)[0x7f2b0dcf9294]
> /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f2b0d0e54b0]
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.
> 0(default_lookup+0x1b)[0x7f2b0dd6072b]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/debug/io-stats.
> so(+0x4533)[0x7f2b01354533]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.
> so(+0x2d19f)[0x7f2b0113319f]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.
> so(+0xb746)[0x7f2b01111746]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.
> so(+0xb7d5)[0x7f2b011117d5]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.
> so(+0xc14c)[0x7f2b0111214c]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.
> so(+0xb81e)[0x7f2b0111181e]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.
> so(+0xbeeb)[0x7f2b01111eeb]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.
> so(+0xc178)[0x7f2b01112178]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.
> so(+0xb7fe)[0x7f2b011117fe]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.
> so(+0xc1f4)[0x7f2b011121f4]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/xlator/protocol/server.
> so(+0x2d379)[0x7f2b01133379]
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.
> 0(rpcsvc_handle_rpc_call+0x325)[0x7f2b0dabb2a5]
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpcsvc_notify+0x17e)[0x7f2b0dabb51e]
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.
> 0(rpc_transport_notify+0x23)[0x7f2b0dabd403]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/rpc-transport/socket.
> so(+0x6b66)[0x7f2b08e55b66]
> /usr/lib/x86_64-linux-gnu/glusterfs/3.10.1/rpc-transport/socket.
> so(+0x8d4f)[0x7f2b08e57d4f]
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7663a)[0x7f2b0dd4363a]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f2b0d4806ba]
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f2b0d1b682d]
> ---------

The backtrace doesn't indicate it to be a glusterd crash. You need to open a different bug with the corefile attached.

Comment 8 Shyamsundar 2017-09-05 17:30:44 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.