Bug 1134690

Summary:	[SNAPSHOT]: glusterd crash while snaphshot creation was in progress
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	senaik
Component:	snapshot	Assignee:	Avra Sengupta <asengupt>
Status:	CLOSED ERRATA	QA Contact:	senaik
Severity:	urgent	Docs Contact:
Priority:	high
Version:	rhgs-3.0	CC:	asrivast, rhs-bugs, rjoseph, smohan, storage-qa-internal, vagarwal
Target Milestone:	---
Target Release:	RHGS 3.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	SNAPSHOT
Fixed In Version:	glusterfs-3.7.0-3.el6rhs	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1138577 (view as bug list)		Environment:
Last Closed:	2015-07-29 04:35:35 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1138577, 1202842, 1223636

Description senaik 2014-08-28 06:21:37 UTC

Description of problem:
=======================
While snapshot creation was in progress, glusterd crashed on one of the nodes in the cluster. 


Version-Release number of selected component (if applicable):
============================================================
glusterfs 3.6.0.27

How reproducible:
=================
1/1


Steps to Reproduce:
==================
12 node cluster 
6x2 dist rep volume 

1.Fuse and NFS mount the volume and create IO 
for i in {21..100}; do cp -rvf /etc nfs_etc.$i ; done
for i in {21..100}; do cp -rvf /etc fuse_etc.$i ; done
for i in {1..50}; do dd if=/dev/urandom of=fuse_file"$i" bs=1024M count=1; done

Tried a few snapshot operations like snapshot create ,list , delete and restore 

2.Now created some more snapshots in a loop 

3.After some time snapshot command failed as it took more than 2 min - crossed the cli time out and the remaning snapshots failed with "Another Transaction in progress"

4. Waited for sometime untill the lock would be released and executed snapshot create command again , and it failed with quorum not met

5. [root@dhcp-8-29-179 ~]# service glusterd status
glusterd dead but pid file exists


Actual results:
================
glusterd crashed 

Expected results:
================
There should be no crash observed 


Additional info:
==============

{trans = 0x195a790, svc = 0x194b7f0, prog = 0x1952620, xid = 2349, prognum = 1238433, progver = 3, procnum = 3, type = 0,	uid = 0, gid = 0, pid = 0, lk_owner = {len = 4, data = '\000' <repeats 1023 times>}, gfs_id = 0, auxgids = 0x4543fd7c, auxgidsmall = {0 <repeats 128 times>}, auxgidlarge = 0x0, auxgidcount = 0, msg = {{iov_base = 0x7f1dac276040, iov_len = 4548}, {  iov_base = 0x0, iov_len = 0} <repeats 15 times>},	count = 1, iobref = 0x4543f810,	rpc_status = 0,	rpc_err = 0, auth_err = 0, txlist = {next = 0x454400ac,	prev = 0x454400ac}, payloadsize = 0, cred = { flavour = 390039, datalen = 24, authdata = '\000' <repeats 19 times>, "\004", '\000' <repeats 379 times>}, verf = {flavour = 0, datalen = 0, authdata = '\000' <repeats 399 times>}, synctask = _gf_true, private = 0x0, trans_private = 0x0, hdr_iobuf = 0x41e22920, reply = 0x0}

And following is the content of the trans obj:
{ops = 0x7f1dae1db5c0, listener = 0x194be20, private = 0x193ac00,	xl_private = 0x0, xl = 0x1943b30, mydata = 0x194b7f0, lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins =	0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, refcount = 4188, outstanding_rpc_count = 0, ctx = 0x1912010, options = 0x0, name = 0x1ade460 "socket.management",	dnscache = 0x0,	drc_client = 0x0, buf = 0x0, init = 0x7f1dadfd0f10 <init>, fini = 0x7f1dadfd3880 <fini>, reconfigure = 0, notify = 0x3802809ac0 <rpcsvc_notify>,  notify_data =	0x0, peerinfo = {sockaddr = {ss_family = 2, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>},	sockaddr_len = 16, identifier = "10.8.29.222:1022", '\000' <repeats 91 times>, 
    max_op_version = 0,	min_op_version = 0, volname = '\000' <repeats 1023 times>}, myinfo = {sockaddr = {ss_family = 2, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, sockaddr_len = 16, identifier = "10.8.29.179:24007", '\000' <repeats 90 times>, max_op_version = 0, min_op_version = 0, volname = '\000' <repeats 1023 times>}, total_bytes_read = 24303292, total_bytes_write = 357927, list = { next = 0xbabebabe, prev = 0xcafecafe}, bind_insecure = 0, dl_handle = 0x0}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(gdb) bt
#0  0x0000003a26e32925 in raise () from /lib64/libc.so.6
#1  0x0000003a26e34105 in abort () from /lib64/libc.so.6
#2  0x0000003a26e70837 in __libc_message () from /lib64/libc.so.6
#3  0x0000003a26e76166 in malloc_printerr () from /lib64/libc.so.6
#4  0x0000003a26e78ca3 in _int_free () from /lib64/libc.so.6
#5  0x000000380244d694 in mem_put (ptr=0x4543f92c) at mem-pool.c:526
#6  0x0000003802808e56 in rpcsvc_submit_generic (req=0x4543f92c, proghdr=0x45640870, 
    hdrcount=<value optimized out>, payload=0x0, payloadcount=0, iobref=0x7f1d60153fb0) at rpcsvc.c:1266
#7  0x00000038028092f6 in rpcsvc_error_reply (req=0x4543f92c) at rpcsvc.c:1285
#8  0x000000380280936b in rpcsvc_check_and_reply_error (ret=-1, frame=<value optimized out>, opaque=0x4543f92c)
    at rpcsvc.c:547
#9  0x000000380245b9ea in synctask_wrap (old_task=<value optimized out>) at syncop.c:335
#10 0x0000003a26e43bf0 in ?? () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[root@dhcp-8-29-222 ~]# gluster peer status
Number of Peers: 11

Hostname: 10.8.29.193
Uuid: 13b1df2a-808f-4a6e-9008-f11b36a54023
State: Peer in Cluster (Connected)

Hostname: 10.8.29.179
Uuid: f79c05c5-f74b-4677-a61b-1a09bb387288
State: Peer in Cluster (Disconnected)

Hostname: 10.8.29.190
Uuid: 8c50ee5b-0479-4180-b87e-55b64332e990
State: Peer in Cluster (Connected)

Hostname: 10.8.29.36
Uuid: 73312a82-d515-4845-b677-7cf4613d757f
State: Peer in Cluster (Connected)

Hostname: 10.8.30.229
Uuid: 25ce55db-d67a-45f3-865c-116086aa774b
State: Peer in Cluster (Connected)

Hostname: 10.8.30.126
Uuid: c1ddd33b-3022-4e40-9fd2-371d98d1f814
State: Peer in Cluster (Connected)

Hostname: 10.8.29.210
Uuid: 38639900-aa13-4397-b404-3688cec05d2d
State: Peer in Cluster (Connected)

Hostname: 10.8.29.220
Uuid: 4b25874f-e3f6-4f03-b53c-a50babe8714b
State: Peer in Cluster (Connected)

Hostname: 10.8.29.209
Uuid: c6a8c9a2-8f20-4442-902b-071e8255084e
State: Peer in Cluster (Connected)

Hostname: 10.8.29.212
Uuid: add11505-cbb5-4bf8-bfb2-1ff84293a909
State: Peer in Cluster (Connected)

Hostname: 10.8.29.203
Uuid: 52eed23f-9bfa-4745-a033-28d7d486a942
State: Peer in Cluster (Connected)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[root@dhcp-8-29-222 ~]# gluster v i 
 
Volume Name: Volume0_Scale
Type: Distributed-Replicate
Volume ID: 7d8ca209-0706-4fc2-aa09-039647c42c75
Status: Started
Snap Volume: no
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.8.29.222:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick1/b1
Brick2: 10.8.29.193:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick2/b1
Brick3: 10.8.29.179:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick3/b1
Brick4: 10.8.29.190:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick4/b1
Brick5: 10.8.30.229:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick5/b1
Brick6: 10.8.30.126:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick6/b1
Brick7: 10.8.29.220:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick7/b1
Brick8: 10.8.29.209:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick8/b1
Brick9: 10.8.29.212:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick9/b1
Brick10: 10.8.29.203:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick10/b1
Brick11: 10.8.29.36:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick11/b1
Brick12: 10.8.29.210:/var/run/gluster/snaps/f02c456eae9743d4a16886fe420c418d/brick12/b1
Options Reconfigured:
features.barrier: disable
performance.readdir-ahead: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256

Comment 5 senaik 2014-09-03 12:49:45 UTC

I was able to hit the crash, but the bt looks different. 

Followed the similar steps as mentioned in "Steps to Reproduce" along with the step of trying to create another volume (which was immediately interrupted) in between while snapshot creation was in progress and glusterd crashed immediately. 

Connection failed. Please check if gluster daemon is operational.

real    0m0.098s
user    0m0.077s
sys     0m0.019s
Connection failed. Please check if gluster daemon is operational.

I tried the similar steps on 4 node cluster, but did not face the issue. Retried the issue on 12 node cluster and hit the crash 

bt : 
===
#0  __gf_free (free_ptr=0x504e280) at mem-pool.c:252
#1  0x000000380244d694 in mem_put (ptr=0x504e29c) at mem-pool.c:526
#2  0x0000003802808e56 in rpcsvc_submit_generic (req=0x504e29c, proghdr=0xfd25770, hdrcount=<value optimized out>, payload=0x0, payloadcount=0, iobref=0x7ff8d0663820) at rpcsvc.c:1266
#3  0x00000038028092f6 in rpcsvc_error_reply (req=0x504e29c) at rpcsvc.c:1285
#4  0x000000380280936b in rpcsvc_check_and_reply_error (ret=-1, frame=<value optimized out>, opaque=0x504e29c) at rpcsvc.c:547
#5  0x000000380245b9ea in synctask_wrap (old_task=<value optimized out>) at syncop.c:335
#6  0x0000003a26e43bf0 in ?? () from /lib64/libc-2.12.so
#7  0x0000000000000000 in ?? ()
(gdb)

Comment 7 senaik 2014-09-04 07:05:10 UTC

The steps are already updated in Description , nevertheless reposting the steps:

12 node cluster 
6x2 dist rep volume 

1.Fuse and NFS mount the volume and create IO 

for i in {101..140}; do dd if=/dev/urandom of=file_"$i" bs=1024M count=1; done
for i in {101..140}; do dd if=/dev/urandom of=file_nfs_"$i" bs=1024M count=1; done

Tried a few snapshot operations like snapshot create ,list , delete and restore 

2.Now created some more snapshots in a loop 

While snapshot creation is in progress, tried to create new volume with existing bricks (which was immediately interrupted) in between while snapshot creation was in progress and glusterd crashed immediately

(Snapshot command failed as it took more than 2 min - crossed the cli time out and the remaning snapshots failed with error "Connection failed. Please check if gluster daemon is operational") 

5. [root@dhcp-8-29-179 ~]# service glusterd status
glusterd dead but pid file exists

Comment 8 rjoseph 2014-09-05 08:41:20 UTC

Following is the initial analysis of the bug:

The crash is happening during a network disconnect when brick-op handler is sending a response back to the originator node.

Response is sent by calling glusterd_submit_reply. This function deletes the req object upon completion. The function returns an error code (-1) if it fails to send the reply, but it anyway deletes the req object.

brick-op handler is called from the sync-op framework. sync-op framework calls rpcsvc_check_and_reply_error on error which in turns calls glusterd_submit_reply to send the error back to the originator. Which will delete the req object again.

Therefore if during a brick-op handler network disconnect happens then it might lead to double deletion of req object and hence leading to crash.

Comment 12 senaik 2015-07-20 07:24:29 UTC

Version : glusterfs-3.7.1-11.el6rhs.x86_64

Retried the steps as mentioned in Description in a 8 node cluster did not observer any crash. 

Marking the bug 'verified'

Comment 14 errata-xmlrpc 2015-07-29 04:35:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html