Description of problem: Created 256 snaps in a loop and then started deleting them in a loop. The dd was ongoing on the mountpoint. Out of 4 nodes glusterd crashed in two nodes each with different back traces. (gdb) bt #0 uuid_unpack (in=0x700000010 <Address 0x700000010 out of bounds>, uu=0x10e98e0) at ../../contrib/uuid/unpack.c:44 #1 0x000000350c657283 in uuid_compare (uu1=<value optimized out>, uu2=0xcd6158 "\305\006<FZdC\252\236pf\247Qt\360\276/var/lib/glusterd") at ../../contrib/uuid/compare.c:46 #2 0x00007fb39103fb3e in glusterd_snap_volume_remove (rsp_dict=0x7fb39910d48c, snap_vol=0x7fb37fffffb8, remove_lvm=_gf_true, force=_gf_true) at glusterd-snapshot.c:1135 #3 0x00007fb39103fef3 in glusterd_snap_remove (rsp_dict=0x7fb39910d48c, snap=0x7fb3807f9180, remove_lvm=_gf_true, force=_gf_true) at glusterd-snapshot.c:1246 #4 0x00007fb391045118 in glusterd_snapshot_create_commit (dict=0x7fb39910dd4c, op_errstr=<value optimized out>, rsp_dict=0x7fb39910d48c) at glusterd-snapshot.c:3817 #5 0x00007fb3910454ee in glusterd_snapshot (dict=0x7fb39910dd4c, op_errstr=0x10eab80, rsp_dict=0x7fb39910d48c) at glusterd-snapshot.c:4889 #6 0x00007fb39104ab1e in gd_mgmt_v3_commit_fn (op=GD_OP_SNAP, dict=0x7fb39910dd4c, op_errstr=0x10eab80, rsp_dict=0x7fb39910d48c) at glusterd-mgmt.c:207 #7 0x00007fb391047893 in glusterd_handle_commit_fn (req=0x7fb390a5e02c) at glusterd-mgmt-handler.c:548 #8 0x00007fb390fa771f in glusterd_big_locked_handler (req=0x7fb390a5e02c, actor_fn=0x7fb391047640 <glusterd_handle_commit_fn>) at glusterd-handler.c:78 #9 0x000000350c657c22 in synctask_wrap (old_task=<value optimized out>) at syncop.c:333 #10 0x0000003352843bf0 in ?? () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () Version-Release number of selected component (if applicable): glusterfs 3.5qa2 built on Apr 13 2014 20:41:18 How reproducible: Delete fails consistently. But crash is not consistent. Steps to Reproduce: 1. Create 256 snaps on a volume while dd is going on on the mountpoint. 2. While dd is still ongoing, start deleting the snaps in a loop Actual results: Few delete fails. And two glusterd crashes found. In one node (gdb) bt #0 uuid_unpack (in=0x700000010 <Address 0x700000010 out of bounds>, uu=0x10e98e0) at ../../contrib/uuid/unpack.c:44 #1 0x000000350c657283 in uuid_compare (uu1=<value optimized out>, uu2=0xcd6158 "\305\006<FZdC\252\236pf\247Qt\360\276/var/lib/glusterd") at ../../contrib/uuid/compare.c:46 #2 0x00007fb39103fb3e in glusterd_snap_volume_remove (rsp_dict=0x7fb39910d48c, snap_vol=0x7fb37fffffb8, remove_lvm=_gf_true, force=_gf_true) at glusterd-snapshot.c:1135 #3 0x00007fb39103fef3 in glusterd_snap_remove (rsp_dict=0x7fb39910d48c, snap=0x7fb3807f9180, remove_lvm=_gf_true, force=_gf_true) at glusterd-snapshot.c:1246 #4 0x00007fb391045118 in glusterd_snapshot_create_commit (dict=0x7fb39910dd4c, op_errstr=<value optimized out>, rsp_dict=0x7fb39910d48c) at glusterd-snapshot.c:3817 #5 0x00007fb3910454ee in glusterd_snapshot (dict=0x7fb39910dd4c, op_errstr=0x10eab80, rsp_dict=0x7fb39910d48c) at glusterd-snapshot.c:4889 #6 0x00007fb39104ab1e in gd_mgmt_v3_commit_fn (op=GD_OP_SNAP, dict=0x7fb39910dd4c, op_errstr=0x10eab80, rsp_dict=0x7fb39910d48c) at glusterd-mgmt.c:207 #7 0x00007fb391047893 in glusterd_handle_commit_fn (req=0x7fb390a5e02c) at glusterd-mgmt-handler.c:548 #8 0x00007fb390fa771f in glusterd_big_locked_handler (req=0x7fb390a5e02c, actor_fn=0x7fb391047640 <glusterd_handle_commit_fn>) at glusterd-handler.c:78 #9 0x000000350c657c22 in synctask_wrap (old_task=<value optimized out>) at syncop.c:333 #10 0x0000003352843bf0 in ?? () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () IN other node (gdb) bt #0 0x00000000019441c0 in ?? () #1 0x0000003706408196 in rpcsvc_transport_submit (trans=<value optimized out>, rpchdr=<value optimized out>, rpchdrcount=<value optimized out>, proghdr=<value optimized out>, proghdrcount=<value optimized out>, progpayload=<value optimized out>, progpayloadcount=0, iobref=0x7f21388e0c40, priv=0x0) at rpcsvc.c:1006 #2 0x0000003706408b18 in rpcsvc_submit_generic (req=0x7f21401d502c, proghdr=0x1d56a80, hdrcount=<value optimized out>, payload=0x0, payloadcount=0, iobref=0x7f21388e0c40) at rpcsvc.c:1190 #3 0x0000003706408f46 in rpcsvc_error_reply (req=0x7f21401d502c) at rpcsvc.c:1238 #4 0x0000003706408fbb in rpcsvc_check_and_reply_error (ret=-1, frame=<value optimized out>, opaque=0x7f21401d502c) at rpcsvc.c:492 #5 0x0000003706057c3a in synctask_wrap (old_task=<value optimized out>) at syncop.c:335 #6 0x00000039fea43bf0 in ?? () from /lib64/libc.so.6 #7 0x0000000000000000 in ?? () The glusterd other two nodes didn't crash. Expected results: The deletes should not fail and the glusterd should not crash. Additional info:
Issue looks similar to Bug# 1088355 and the fix (patch# 7579) for the same is posted to upstream. Can you can please try running this test-case once the patch is merged upstream.
Patch http://review.gluster.org/#/c/7579/ is merged upstream
Setting flags required to add BZs to RHS 3.0 Errata
Seems to be working. I did not hit any crash. Tested in version: glusterfs-3.6.0.10-1.el6rhs.x86_64 Moving to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html