1090837 – [SNAPSHOT]: glusterd crashed while deleting snaps in a loop

Bug 1090837 - [SNAPSHOT]: glusterd crashed while deleting snaps in a loop

Summary: [SNAPSHOT]: glusterd crashed while deleting snaps in a loop

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.0.0
Assignee:	Vijaikumar Mallikarjuna
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:	SNAPSHOT
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-04-24 09:48 UTC by M S Vishwanath Bhat
Modified:	2016-09-17 12:55 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.6.0-3.0.el6rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-09-22 19:36:01 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2014:1278	0	normal	SHIPPED_LIVE	Red Hat Storage Server 3.0 bug fix and enhancement update	2014-09-22 23:26:55 UTC

Description M S Vishwanath Bhat 2014-04-24 09:48:16 UTC

Description of problem:
Created 256 snaps in a loop and then started deleting them in a loop. The dd was ongoing on the mountpoint. Out of 4 nodes glusterd crashed in two nodes each with different back traces.

(gdb) bt
#0  uuid_unpack (in=0x700000010 <Address 0x700000010 out of bounds>, uu=0x10e98e0) at ../../contrib/uuid/unpack.c:44
#1  0x000000350c657283 in uuid_compare (uu1=<value optimized out>, uu2=0xcd6158 "\305\006<FZdC\252\236pf\247Qt\360\276/var/lib/glusterd") at ../../contrib/uuid/compare.c:46
#2  0x00007fb39103fb3e in glusterd_snap_volume_remove (rsp_dict=0x7fb39910d48c, snap_vol=0x7fb37fffffb8, remove_lvm=_gf_true, force=_gf_true) at glusterd-snapshot.c:1135
#3  0x00007fb39103fef3 in glusterd_snap_remove (rsp_dict=0x7fb39910d48c, snap=0x7fb3807f9180, remove_lvm=_gf_true, force=_gf_true) at glusterd-snapshot.c:1246
#4  0x00007fb391045118 in glusterd_snapshot_create_commit (dict=0x7fb39910dd4c, op_errstr=<value optimized out>, rsp_dict=0x7fb39910d48c) at glusterd-snapshot.c:3817
#5  0x00007fb3910454ee in glusterd_snapshot (dict=0x7fb39910dd4c, op_errstr=0x10eab80, rsp_dict=0x7fb39910d48c) at glusterd-snapshot.c:4889
#6  0x00007fb39104ab1e in gd_mgmt_v3_commit_fn (op=GD_OP_SNAP, dict=0x7fb39910dd4c, op_errstr=0x10eab80, rsp_dict=0x7fb39910d48c) at glusterd-mgmt.c:207
#7  0x00007fb391047893 in glusterd_handle_commit_fn (req=0x7fb390a5e02c) at glusterd-mgmt-handler.c:548
#8  0x00007fb390fa771f in glusterd_big_locked_handler (req=0x7fb390a5e02c, actor_fn=0x7fb391047640 <glusterd_handle_commit_fn>) at glusterd-handler.c:78
#9  0x000000350c657c22 in synctask_wrap (old_task=<value optimized out>) at syncop.c:333
#10 0x0000003352843bf0 in ?? () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()



Version-Release number of selected component (if applicable):
glusterfs 3.5qa2 built on Apr 13 2014 20:41:18

How reproducible:
Delete fails consistently. But crash is not consistent.

Steps to Reproduce:
1. Create 256 snaps on a volume while dd is going on on the mountpoint.
2. While dd is still ongoing, start deleting the snaps in a loop

Actual results:
Few delete fails. And two glusterd crashes found.

In one node

(gdb) bt
#0  uuid_unpack (in=0x700000010 <Address 0x700000010 out of bounds>, uu=0x10e98e0) at ../../contrib/uuid/unpack.c:44
#1  0x000000350c657283 in uuid_compare (uu1=<value optimized out>, uu2=0xcd6158 "\305\006<FZdC\252\236pf\247Qt\360\276/var/lib/glusterd") at ../../contrib/uuid/compare.c:46
#2  0x00007fb39103fb3e in glusterd_snap_volume_remove (rsp_dict=0x7fb39910d48c, snap_vol=0x7fb37fffffb8, remove_lvm=_gf_true, force=_gf_true) at glusterd-snapshot.c:1135
#3  0x00007fb39103fef3 in glusterd_snap_remove (rsp_dict=0x7fb39910d48c, snap=0x7fb3807f9180, remove_lvm=_gf_true, force=_gf_true) at glusterd-snapshot.c:1246
#4  0x00007fb391045118 in glusterd_snapshot_create_commit (dict=0x7fb39910dd4c, op_errstr=<value optimized out>, rsp_dict=0x7fb39910d48c) at glusterd-snapshot.c:3817
#5  0x00007fb3910454ee in glusterd_snapshot (dict=0x7fb39910dd4c, op_errstr=0x10eab80, rsp_dict=0x7fb39910d48c) at glusterd-snapshot.c:4889
#6  0x00007fb39104ab1e in gd_mgmt_v3_commit_fn (op=GD_OP_SNAP, dict=0x7fb39910dd4c, op_errstr=0x10eab80, rsp_dict=0x7fb39910d48c) at glusterd-mgmt.c:207
#7  0x00007fb391047893 in glusterd_handle_commit_fn (req=0x7fb390a5e02c) at glusterd-mgmt-handler.c:548
#8  0x00007fb390fa771f in glusterd_big_locked_handler (req=0x7fb390a5e02c, actor_fn=0x7fb391047640 <glusterd_handle_commit_fn>) at glusterd-handler.c:78
#9  0x000000350c657c22 in synctask_wrap (old_task=<value optimized out>) at syncop.c:333
#10 0x0000003352843bf0 in ?? () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()



IN other node

(gdb) bt
#0  0x00000000019441c0 in ?? ()
#1  0x0000003706408196 in rpcsvc_transport_submit (trans=<value optimized out>, rpchdr=<value optimized out>, rpchdrcount=<value optimized out>, proghdr=<value optimized out>, 
    proghdrcount=<value optimized out>, progpayload=<value optimized out>, progpayloadcount=0, iobref=0x7f21388e0c40, priv=0x0) at rpcsvc.c:1006
#2  0x0000003706408b18 in rpcsvc_submit_generic (req=0x7f21401d502c, proghdr=0x1d56a80, hdrcount=<value optimized out>, payload=0x0, payloadcount=0, iobref=0x7f21388e0c40)
    at rpcsvc.c:1190
#3  0x0000003706408f46 in rpcsvc_error_reply (req=0x7f21401d502c) at rpcsvc.c:1238
#4  0x0000003706408fbb in rpcsvc_check_and_reply_error (ret=-1, frame=<value optimized out>, opaque=0x7f21401d502c) at rpcsvc.c:492
#5  0x0000003706057c3a in synctask_wrap (old_task=<value optimized out>) at syncop.c:335
#6  0x00000039fea43bf0 in ?? () from /lib64/libc.so.6
#7  0x0000000000000000 in ?? ()

The glusterd other two nodes didn't crash.

Expected results:
The deletes should not fail and the glusterd should not crash.

Additional info:

Comment 2 Vijaikumar Mallikarjuna 2014-05-06 11:38:48 UTC

Issue looks similar to Bug# 1088355 and the fix (patch# 7579) for the same is posted to upstream.

Can you can please try running this test-case once the patch is merged upstream.

Comment 3 Vijaikumar Mallikarjuna 2014-05-09 06:11:06 UTC

Patch http://review.gluster.org/#/c/7579/ is merged upstream

Comment 4 Nagaprasad Sathyanarayana 2014-05-19 10:56:35 UTC

Setting flags required to add BZs to RHS 3.0 Errata

Comment 5 M S Vishwanath Bhat 2014-06-06 07:24:16 UTC

Seems to be working. I did not hit any crash.

Tested in version: glusterfs-3.6.0.10-1.el6rhs.x86_64

Moving to verified.

Comment 7 errata-xmlrpc 2014-09-22 19:36:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html

Note You need to log in before you can comment on or make changes to this bug.