Bug 1099833 - [SNAPSHOT]: glusterd crashed while running snapshot creation for all the volumes while arequal was in progress
Summary: [SNAPSHOT]: glusterd crashed while running snapshot creation for all the volu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: snapshot
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.0.0
Assignee: Avra Sengupta
QA Contact: Rahul Hinduja
URL:
Whiteboard: SNAPSHOT
Depends On:
Blocks: 1100325
TreeView+ depends on / blocked
 
Reported: 2014-05-21 10:12 UTC by Rahul Hinduja
Modified: 2016-09-17 13:03 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.6.0.11-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1100325 (view as bug list)
Environment:
Last Closed: 2014-09-22 19:39:03 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1278 0 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description Rahul Hinduja 2014-05-21 10:12:38 UTC
Description of problem:
=======================

In a scenario where the system was creating snapshots of 4 volumes simultaneously while the IO from the client was in progress, one of the glusterd crashed with the following bt

(gdb) bt
#0  glusterd_handle_mgmt_v3_unlock_fn (req=0x7f47de1aa664) at glusterd-mgmt-handler.c:873
#1  0x00007f47de911acf in glusterd_big_locked_handler (req=0x7f47de1aa664, actor_fn=0x7f47de9c41f0 <glusterd_handle_mgmt_v3_unlock_fn>) at glusterd-handler.c:81
#2  0x000000384f25b742 in synctask_wrap (old_task=<value optimized out>) at syncop.c:333
#3  0x000000384de43bf0 in ?? () from /lib64/libc-2.12.so
#4  0x0000000000000000 in ?? ()
(gdb) f 0
#0  glusterd_handle_mgmt_v3_unlock_fn (req=0x7f47de1aa664) at glusterd-mgmt-handler.c:873
873	        gf_log (this->name, GF_LOG_TRACE, "Returning %d", ret);
(gdb) l
868	                        GF_FREE (ctx);
869	        }
870	
871	        free (lock_req.dict.dict_val);
872	
873	        gf_log (this->name, GF_LOG_TRACE, "Returning %d", ret);
874	        return ret;
875	}
876	
877	int
(gdb) 



Version-Release number of selected component (if applicable):
==============================================================

glusterfs-3.6.0.4-1.el6rhs.x86_64


Steps Carried:
==============
1. Create 4 node clustered system
2. Create and start 4 volumes to the system named(vol0 to vol3)
3. Mount the volumes (Fuse and NFS)
4. Start are equal from all the 8 mounts of the 4 volumes [2(fuse+nfs)*4(volumes)=8]
5. While IO is in progress, start the snap creation of all the 4 volumes simultaneously from different nodes in cluster.


Actual results:
===============
After few snaps, glusterd crashed


Expected results:
=================

glusterd should not crash


Additional info:
================

Log snippet:
============

[2014-05-20 13:23:39.420078] E [glusterd-mgmt.c:116:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on 10.70.42.175. Please check log file for details.
[2014-05-20 13:23:43.583228] E [glusterd-mgmt-handler.c:643:glusterd_handle_post_validate_fn] 0-management: Failed to decode post validation request received from peer
[2014-05-20 13:23:43.583346] E [rpcsvc.c:533:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2014-05-20 13:23:43.584277] I [glusterd-snapshot.c:4625:glusterd_do_snap_cleanup] 0-management: snap b163 is not found
[2014-05-20 13:23:43.584345] E [glusterd-snapshot.c:5931:glusterd_snapshot_create_postvalidate] 0-management: unable to find snap b163
[2014-05-20 13:23:43.584552] I [glusterd-rpc-ops.c:556:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 3801feb3-7066-4c86-996b-366e71ab3dac
[2014-05-20 13:23:43.585458] E [glusterd-mgmt.c:1532:glusterd_mgmt_v3_release_peer_locks] 0-management: Unlock failed on peers
[2014-05-20 13:23:43.587079] E [glusterd-mgmt-handler.c:810:glusterd_handle_mgmt_v3_unlock_fn] 0-management: Failed to decode unlock request received from peer
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2014-05-20 13:23:43
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.4
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x384f21fe56]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x384f23a28f]
/lib64/libc.so.6[0x384de329a0]
/usr/lib64/glusterfs/3.6.0.4/xlator/mgmt/glusterd.so(+0xe2438)[0x7f47de9c4438]
/usr/lib64/glusterfs/3.6.0.4/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f47de911acf]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x384f25b742]
/lib64/libc.so.6[0x384de43bf0]

Comment 3 Avra Sengupta 2014-05-26 06:35:45 UTC
Fix at https://code.engineering.redhat.com/gerrit/25660

Comment 4 Rahul Hinduja 2014-06-05 11:50:43 UTC
Verified with build: glusterfs-3.6.0.12-1.el6rhs.x86_64

Didnt observe the glusterd crash with the steps mentioned above. 

Moving the bug to verified state

Comment 6 errata-xmlrpc 2014-09-22 19:39:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html


Note You need to log in before you can comment on or make changes to this bug.