Bug 1043535
Summary: | glusterd crash seen in glusterfs 3.4.0.49rhs | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Shruti Sampat <ssampat> | ||||||
Component: | glusterfs | Assignee: | Kaushal <kaushal> | ||||||
Status: | CLOSED ERRATA | QA Contact: | shylesh <shmohan> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 2.1 | CC: | dtsang, grajaiya, knarra, mmahoney, pprakash, sasundar, sdharane, spradhan, vagarwal, vbellur | ||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||
Target Release: | RHGS 2.1.2 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.4.0.52rhs-1.el6rhs | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1044327 (view as bug list) | Environment: | |||||||
Last Closed: | 2014-02-25 08:09:07 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1044327 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Shruti Sampat
2013-12-16 15:04:44 UTC
Created attachment 837263 [details]
core
Shruti, This issue is the clone of, https://bugzilla.redhat.com/show_bug.cgi?id=1024316 and that has BLOCKER flag added to that. I am not working in glusterd segment. Just came across it and loaded the core. Quick Core analysis: (gdb) where #0 synctask_yield (task=0x0) at syncop.c:247 #1 0x00007fd2cae55f45 in gd_stop_rebalance_process (volinfo=0x15a28c0) at glusterd-utils.c:9102 #2 0x00007fd2cae5d368 in gd_check_and_update_rebalance_info (old_volinfo=0x15a28c0, new_volinfo=0x15b1c40) at glusterd-utils.c:3241 #3 0x00007fd2cae6b40f in glusterd_import_friend_volume (vols=0x7fd2cd0cbef0, count=2) at glusterd-utils.c:3287 #4 0x00007fd2cae6b536 in glusterd_import_friend_volumes (vols=0x7fd2cd0cbef0) at glusterd-utils.c:3327 #5 0x00007fd2cae6b752 in glusterd_compare_friend_data (vols=0x7fd2cd0cbef0, status=0x7fffea9c40ec, hostname=0x15a3ac0 "10.70.37.169") at glusterd-utils.c:3471 #6 0x00007fd2cae478ac in glusterd_ac_handle_friend_add_req (event=<value optimized out>, ctx=0x160f7a0) at glusterd-sm.c:654 #7 0x00007fd2cae47f2e in glusterd_friend_sm () at glusterd-sm.c:1026 #8 0x00007fd2cae4672e in __glusterd_handle_incoming_friend_req (req=0x7fd2c96c702c) at glusterd-handler.c:2043 #9 0x00007fd2cae3619f in glusterd_big_locked_handler (req=0x7fd2c96c702c, actor_fn=0x7fd2cae46430 <__glusterd_handle_incoming_friend_req>) at glusterd-handler.c:77 #10 0x00000030a1809585 in rpcsvc_handle_rpc_call (svc=<value optimized out>, trans=<value optimized out>, msg=0x15cdf70) at rpcsvc.c:629 #11 0x00000030a18097c3 in rpcsvc_notify (trans=0x15d07f0, mydata=<value optimized out>, event=<value optimized out>, data=0x15cdf70) at rpcsvc.c:723 #12 0x00000030a180adf8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:512 #13 0x00007fd2c94bcd86 in socket_event_poll_in (this=0x15d07f0) at socket.c:2119 #14 0x00007fd2c94be69d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x15d07f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2229 #15 0x00000030a1062387 in event_dispatch_epoll_handler (event_pool=0x1584ee0) at event-epoll.c:384 #16 event_dispatch_epoll (event_pool=0x1584ee0) at event-epoll.c:445 #17 0x00000000004069d7 in main (argc=2, argv=0x7fffea9c5ed8) at glusterfsd.c:2050 synctask_yield() segfaults because of NULL-ptr-deref. Looks like the routine synctask_get() in GD_SYNCOP() macro returns NULL. glusterd expert(s) can tell what can cause synctask to be NULL. I guess validating the inputs and logging would be much better than assuming and crashing. -Santosh Saw another crash later on the same server - pending frames: frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-12-16 22:13:49configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.49rhs /lib64/libc.so.6[0x309fc32960] /usr/lib64/glusterfs/3.4.0.49rhs/xlator/mgmt/glusterd.so(__glusterd_defrag_notify+0x1d0)[0x7fd916e095d0] /usr/lib64/glusterfs/3.4.0.49rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fd916db93c0] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x109)[0x30a180f539] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x30a180adf8] /usr/lib64/glusterfs/3.4.0.49rhs/rpc-transport/socket.so(+0x557c)[0x7fd91543c57c] /usr/lib64/glusterfs/3.4.0.49rhs/rpc-transport/socket.so(+0xa5b8)[0x7fd9154415b8] /usr/lib64/libglusterfs.so.0[0x30a1062387] /usr/sbin/glusterd(main+0x6c7)[0x4069d7] /lib64/libc.so.6(__libc_start_main+0xfd)[0x309fc1ecdd] /usr/sbin/glusterd[0x404619] --------- Saw another crash, find core attached. Created attachment 837656 [details]
core - new
Patch posted for review at https://code.engineering.redhat.com/gerrit/17693 As per discussion with krishnan simplified steps to reproduce the problem is Scenario 1 ---------- 1. created a distributed-replicate volume 2. Run rebalance and remove-bricks on the volume 3. stop the volume and delete the volume 4 Run some gluster commands result: ---- No crash in glusterd scenario 2 ---------- 1. created a distributed volume using 2 node cluster 2. add-brick and ran rebalance on the volume 3. bring down one of the node 4. while a node is down run volume set command from another node 5. after node comes back run some gluster commands result: ------ No crash in glusterd verified on 3.4.0.54rhs-2.el6rhs.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html |