Bug 1259992
| Summary: | Glusterd crashed during heals | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Bhaskarakiran <byarlaga> | ||||||
| Component: | glusterd | Assignee: | Satish Mohan <smohan> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | storage-qa-internal <storage-qa-internal> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | rhgs-3.1 | CC: | amukherj, mzywusko, nlevinki, rcyriac, rhinduja, sankarshan, sasundar, vbellur | ||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | glusterd | ||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-02-08 13:22:56 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1260930 | ||||||||
| Attachments: |
|
||||||||
|
Description
Bhaskarakiran
2015-09-04 05:25:29 UTC
There is one more glusterd crash while enabling the heal with gluster v heal <volname> enable command. Below is the backtrace. Let me know if i have to file a new bug for this.
Corefile: interstellar.lab.eng.blr.redhat.com:/core.8010 - root/redhat if it needs to be looked at.
(gdb) bt
#0 0x00007f7c5df3cf8b in __strcmp_sse42 () from /lib64/libc.so.6
#1 0x00007f7c542461e7 in glusterd_check_client_op_version_support (
volname=0x7f7c3c5dabd0 "vol2", op_version=op_version@entry=30703,
op_errstr=op_errstr@entry=0x7f7c40249720) at glusterd-utils.c:9930
#2 0x00007f7c5421b7f7 in glusterd_op_stage_set_volume (
dict=dict@entry=0x7f7c3c38ddbc, op_errstr=op_errstr@entry=0x7f7c40249720)
at glusterd-op-sm.c:1306
#3 0x00007f7c5421e2fb in glusterd_op_stage_validate (op=GD_OP_SET_VOLUME,
dict=dict@entry=0x7f7c3c38ddbc, op_errstr=op_errstr@entry=0x7f7c40249720,
rsp_dict=rsp_dict@entry=0x7f7c3c4d4d5c) at glusterd-op-sm.c:5406
#4 0x00007f7c5421e47f in glusterd_op_ac_stage_op (event=0x7f7c3c704190,
ctx=0x7f7c3c5cb8d0) at glusterd-op-sm.c:5164
#5 0x00007f7c54224a4f in glusterd_op_sm () at glusterd-op-sm.c:7371
#6 0x00007f7c5420b9ab in __glusterd_handle_stage_op (req=req@entry=0x7f7c5fa6eb78)
at glusterd-handler.c:1022
#7 0x00007f7c54209c00 in glusterd_big_locked_handler (req=0x7f7c5fa6eb78,
actor_fn=0x7f7c5420b6c0 <__glusterd_handle_stage_op>) at glusterd-handler.c:83
#8 0x00007f7c5f794102 in synctask_wrap (old_task=<optimized out>) at syncop.c:381
#9 0x00007f7c5de520f0 in ?? () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()
(gdb) q
Outlook of RCA of this bug is that i found (rpc_transport_t*)xprt object got corrupted. By debugging of core file i saw that xprt object have deleted by something else because we saw "oxbabebabe" address at the time of printing xprt list in gdb and this address can be assigned only by list_del (deleting node) operation. Now we are still analysing that how can xprt_list point to the deleted object in the list and exacerbating heal disable command. further analysis is going on... Created attachment 1072035 [details]
new core
Observed the glusterd crash with bt: #0 0x00007f45b5ba832d in __gf_free (free_ptr=0x7f459000a4b0) at mem-pool.c:313 #1 0x00007f45aa6393d0 in glusterd_friend_sm () at glusterd-sm.c:1250 #2 0x00007f45aa63269c in __glusterd_handle_incoming_unfriend_req (req=req@entry=0x7f45b5e9706c) at glusterd-handler.c:2597 #3 0x00007f45aa62cc00 in glusterd_big_locked_handler (req=0x7f45b5e9706c, actor_fn=0x7f45aa6324d0 <__glusterd_handle_incoming_unfriend_req>) at glusterd-handler.c:83 #4 0x00007f45b593d549 in rpcsvc_handle_rpc_call (svc=0x7f45b6544040, trans=trans@entry=0x7f4590000920, msg=msg@entry=0x7f4590010960) at rpcsvc.c:703 #5 0x00007f45b593d7ab in rpcsvc_notify (trans=0x7f4590000920, mydata=<optimized out>, event=<optimized out>, data=0x7f4590010960) at rpcsvc.c:797 #6 0x00007f45b593f873 in rpc_transport_notify (this=this@entry=0x7f4590000920, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f4590010960) at rpc-transport.c:543 #7 0x00007f45a83b5bb6 in socket_event_poll_in (this=this@entry=0x7f4590000920) at socket.c:2290 #8 0x00007f45a83b8aa4 in socket_event_handler (fd=fd@entry=7, idx=idx@entry=2, data=0x7f4590000920, poll_in=1, poll_out=0, poll_err=0) at socket.c:2403 #9 0x00007f45b5bd66aa in event_dispatch_epoll_handler (event=0x7f45a61aae80, event_pool=0x7f45b6521c10) at event-epoll.c:575 #10 event_dispatch_epoll_worker (data=0x7f45b6544820) at event-epoll.c:678 #11 0x00007f45b49dddf5 in start_thread () from /lib64/libpthread.so.0 #12 0x00007f45b43241ad in clone () from /lib64/libc.so.6 (gdb) Updating this bug with core after the discussion with assignee bug https://bugzilla.redhat.com/show_bug.cgi?id=1262236 have raised to fix the workaround for this bug. BZ 1262236 is for workaround of BZ 1259992 bug. Will continue work on RCA of BZ 1259992 (In reply to Rahul Hinduja from comment #5) > Observed the glusterd crash with bt: > > #0 0x00007f45b5ba832d in __gf_free (free_ptr=0x7f459000a4b0) at > mem-pool.c:313 > #1 0x00007f45aa6393d0 in glusterd_friend_sm () at glusterd-sm.c:1250 > #2 0x00007f45aa63269c in __glusterd_handle_incoming_unfriend_req > (req=req@entry=0x7f45b5e9706c) at glusterd-handler.c:2597 > #3 0x00007f45aa62cc00 in glusterd_big_locked_handler (req=0x7f45b5e9706c, > actor_fn=0x7f45aa6324d0 <__glusterd_handle_incoming_unfriend_req>) at > glusterd-handler.c:83 > #4 0x00007f45b593d549 in rpcsvc_handle_rpc_call (svc=0x7f45b6544040, > trans=trans@entry=0x7f4590000920, msg=msg@entry=0x7f4590010960) at > rpcsvc.c:703 > #5 0x00007f45b593d7ab in rpcsvc_notify (trans=0x7f4590000920, > mydata=<optimized out>, event=<optimized out>, data=0x7f4590010960) at > rpcsvc.c:797 > #6 0x00007f45b593f873 in rpc_transport_notify > (this=this@entry=0x7f4590000920, > event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, > data=data@entry=0x7f4590010960) at rpc-transport.c:543 > #7 0x00007f45a83b5bb6 in socket_event_poll_in > (this=this@entry=0x7f4590000920) at socket.c:2290 > #8 0x00007f45a83b8aa4 in socket_event_handler (fd=fd@entry=7, > idx=idx@entry=2, data=0x7f4590000920, poll_in=1, poll_out=0, poll_err=0) at > socket.c:2403 > #9 0x00007f45b5bd66aa in event_dispatch_epoll_handler > (event=0x7f45a61aae80, event_pool=0x7f45b6521c10) at event-epoll.c:575 > #10 event_dispatch_epoll_worker (data=0x7f45b6544820) at event-epoll.c:678 > #11 0x00007f45b49dddf5 in start_thread () from /lib64/libpthread.so.0 > #12 0x00007f45b43241ad in clone () from /lib64/libc.so.6 > (gdb) > > Updating this bug with core after the discussion with assignee Rahul could you provide me info how can i access core file. Gaurav, Core is attached with this mail. Please find that in attachment. This crash was observed when ping time out was enabled for GlusterD to GlusterD communication. We don't have any future plan to enable this option back and hence closing this bug. |