Bug 1545045
| Summary: | [Glusterd] glusterd crashed while doing in-service upgrade from 7.4+3.3.1_async to 7.5+3.4.0 | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Prasad Desala <tdesala> |
| Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> |
| Status: | CLOSED DUPLICATE | QA Contact: | Bala Konda Reddy M <bmekala> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.4 | CC: | rhs-bugs, storage-qa-internal, vbellur |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-02-14 12:10:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
(gdb) t a a bt
Thread 8 (Thread 0x7f79d3f1d700 (LWP 2149)):
#0 0x00007f79dc00e4fd in nanosleep () from /lib64/libc.so.6
#1 0x00007f79dc00e394 in sleep () from /lib64/libc.so.6
#2 0x00007f79dd94791d in pool_sweeper (arg=<optimized out>) at mem-pool.c:481
#3 0x00007f79dc77edd5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f79dc047b3d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f79d371c700 (LWP 2150)):
#0 0x00007f79dc782cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f79dd95a548 in syncenv_task (proc=proc@entry=0x5571dc672900) at syncop.c:603
#2 0x00007f79dd95b410 in syncenv_processor (thdata=0x5571dc672900) at syncop.c:695
#3 0x00007f79dc77edd5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f79dc047b3d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f79d2f1b700 (LWP 2151)):
#0 0x00007f79dc782cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f79dd95a548 in syncenv_task (proc=proc@entry=0x5571dc672cc0) at syncop.c:603
#2 0x00007f79dd95b410 in syncenv_processor (thdata=0x5571dc672cc0) at syncop.c:695
#3 0x00007f79dc77edd5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f79dc047b3d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f79cdc40700 (LWP 2162)):
#0 0x00007f79dc782945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f79d246996b in hooks_worker (args=<optimized out>) at glusterd-hooks.c:529
#2 0x00007f79dc77edd5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f79dc047b3d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f79d4f1f700 (LWP 2147)):
#0 0x00007f79dc785eed in nanosleep () from /lib64/libpthread.so.0
#1 0x00007f79dd92d126 in gf_timer_proc (data=0x5571dc6720e0) at timer.c:165
#2 0x00007f79dc77edd5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f79dc047b3d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f79dde01780 (LWP 2146)):
#0 0x00007f79dc77ff47 in pthread_join () from /lib64/libpthread.so.0
#1 0x00007f79dd97c978 in event_dispatch_epoll (event_pool=0x5571dc66aa30) at event-epoll.c:746
#2 0x00005571daf9a097 in main (argc=4, argv=<optimized out>) at glusterfsd.c:2570
Thread 2 (Thread 0x7f79d471e700 (LWP 2148)):
#0 0x00007f79dbfc61d0 in _IO_default_setbuf () from /lib64/libc.so.6
#1 0x00007f79dbfc2769 in __GI__IO_file_setbuf () from /lib64/libc.so.6
#2 0x00007f79dbfc6b52 in _IO_cleanup () from /lib64/libc.so.6
#3 0x00007f79dbf82b1b in __run_exit_handlers () from /lib64/libc.so.6
#4 0x00007f79dbf82bb7 in exit () from /lib64/libc.so.6
#5 0x00005571daf9d30b in cleanup_and_exit (signum=15) at glusterfsd.c:1444
#6 0x00005571daf9d405 in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2165
#7 0x00007f79dc77edd5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f79dc047b3d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f79cd43f700 (LWP 2163)):
#0 0x00007f79d1e18126 in rcu_read_unlock_bp () from /lib64/liburcu-bp.so.1
---Type <return> to continue, or q <return> to quit---
#1 0x00007f79d24130cd in glusterd_mgmt_handshake (this=this@entry=0x5571dc67b0b0, peerctx=peerctx@entry=0x5571dcde0410) at glusterd-handshake.c:2067
#2 0x00007f79d2413765 in __glusterd_peer_dump_version_cbk (req=req@entry=0x7f79b8005170, iov=iov@entry=0x7f79b80051b0, count=count@entry=1, myframe=myframe@entry=0x7f79b8004200)
at glusterd-handshake.c:2222
#3 0x00007f79d23fcaca in glusterd_big_locked_cbk (req=0x7f79b8005170, iov=0x7f79b80051b0, count=1, myframe=0x7f79b8004200, fn=0x7f79d2413610 <__glusterd_peer_dump_version_cbk>)
at glusterd-rpc-ops.c:223
#4 0x00007f79dd6e7d90 in rpc_clnt_handle_reply (clnt=clnt@entry=0x5571dcde1350, pollin=pollin@entry=0x7f79b8007e80) at rpc-clnt.c:778
#5 0x00007f79dd6e8077 in rpc_clnt_notify (trans=<optimized out>, mydata=0x5571dcde1380, event=<optimized out>, data=0x7f79b8007e80) at rpc-clnt.c:971
#6 0x00007f79dd6e3ea3 in rpc_transport_notify (this=this@entry=0x5571dcde1580, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f79b8007e80) at rpc-transport.c:538
#7 0x00007f79cf5f44c6 in socket_event_poll_in (this=this@entry=0x5571dcde1580, notify_handled=<optimized out>) at socket.c:2315
#8 0x00007f79cf5f6a6c in socket_event_handler (fd=14, idx=5, gen=1, data=0x5571dcde1580, poll_in=1, poll_out=0, poll_err=0) at socket.c:2467
#9 0x00007f79dd97c344 in event_dispatch_epoll_handler (event=0x7f79cd43ee80, event_pool=0x5571dc66aa30) at event-epoll.c:583
#10 event_dispatch_epoll_worker (data=0x5571dcde3f80) at event-epoll.c:659
#11 0x00007f79dc77edd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f79dc047b3d in clone () from /lib64/libc.so.6
|
Description of problem: ======================= glusterd crashed while doing in-service upgrade from 7.4+3.3.1_async to 7.5+3.4.0 Version-Release number of selected component (if applicable): 3.12.2-4.el7rhgs.x86_64 How reproducible: Started in-service upgrade on a six node setup, glusterd crashed during node 4 upgrade Steps to Reproduce: =================== 1) Have a 7.4+3.3.1_async 6 node machine. 2) Create 30 volumes (x2,x3, Distributed-Disperse) and enable brick-mux 3) Mount the volumes on clients and start IO from few mount points. 4) Start in-service upgrade. Actual results: =============== On the node where upgrade is in-progress, glusterd crashed. Expected results: ================= No crashes Output snips: ================ * Seen below message on the console, Cleanup : libgcc-4.8.5-16.el7.x86_64 752/752 /var/tmp/rpm-tmp.BkBr28: line 26: 2146 Segmentation fault (core dumped) glusterd --xlator-option *.upgrade=on -N rhel-75repo/productid | 1.6 kB 00:00:00 Verifying : device-mapper-persistent-data-0.7.3-3.el7.x86_64 1/752 * glusterd.log =============== ->/lib64/libglusterfs.so.0(gf_timer_call_after+0x229) [0x7f79dd92d4a9] ) 0-timer: Either ctx is NULL or ctx cleanup started [Invalid argument] pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2018-02-14 06:32:24 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.2 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f79dd91f3f0] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f79dd929334] /lib64/libc.so.6(+0x36280)[0x7f79dbf7f280] /lib64/liburcu-bp.so.1(rcu_read_unlock_bp+0x16)[0x7f79d1e18126] /usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x8c0cd)[0x7f79d24130cd] /usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x8c765)[0x7f79d2413765] /usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x75aca)[0x7f79d23fcaca] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f79dd6e7d90] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1e7)[0x7f79dd6e8077] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f79dd6e3ea3] /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0x74c6)[0x7f79cf5f44c6] /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0x9a6c)[0x7f79cf5f6a6c] /lib64/libglusterfs.so.0(+0x88344)[0x7f79dd97c344] /lib64/libpthread.so.0(+0x7dd5)[0x7f79dc77edd5] /lib64/libc.so.6(clone+0x6d)[0x7f79dc047b3d] --------- * Full BT ============= (gdb) bt full #0 0x00007f79d1e18126 in rcu_read_unlock_bp () from /lib64/liburcu-bp.so.1 No symbol table info available. #1 0x00007f79d24130cd in glusterd_mgmt_handshake (this=this@entry=0x5571dc67b0b0, peerctx=peerctx@entry=0x5571dcde0410) at glusterd-handshake.c:2067 frame = <optimized out> req = {hndsk = {hndsk_len = 57, hndsk_val = 0x7f79b8009a90 ""}} peerinfo = <optimized out> req_dict = 0x7f79b8009440 ret = 0 __FUNCTION__ = "glusterd_mgmt_handshake" #2 0x00007f79d2413765 in __glusterd_peer_dump_version_cbk (req=req@entry=0x7f79b8005170, iov=iov@entry=0x7f79b80051b0, count=count@entry=1, myframe=myframe@entry=0x7f79b8004200) at glusterd-handshake.c:2222 ret = 368 rsp = {gfs_id = 0, op_ret = 0, op_errno = 0, prog = 0x7f79b80065b0} this = 0x5571dc67b0b0 trav = 0x0 next = 0x0 frame = 0x7f79b8004200 peerinfo = 0x5571dcdd6fe0 peerctx = 0x5571dcde0410 conf = 0x7f79ddc76050 msg = '\000' <repeats 1023 times> __FUNCTION__ = "__glusterd_peer_dump_version_cbk" #3 0x00007f79d23fcaca in glusterd_big_locked_cbk (req=0x7f79b8005170, iov=0x7f79b80051b0, count=1, myframe=0x7f79b8004200, fn=0x7f79d2413610 <__glusterd_peer_dump_version_cbk>) at glusterd-rpc-ops.c:223 priv = 0x7f79ddc76050 ret = -1 #4 0x00007f79dd6e7d90 in rpc_clnt_handle_reply (clnt=clnt@entry=0x5571dcde1350, pollin=pollin@entry=0x7f79b8007e80) at rpc-clnt.c:778 conn = 0x5571dcde1380 saved_frame = <optimized out> ret = 0 req = 0x7f79b8005170 xid = 2 __FUNCTION__ = "rpc_clnt_handle_reply" #5 0x00007f79dd6e8077 in rpc_clnt_notify (trans=<optimized out>, mydata=0x5571dcde1380, event=<optimized out>, data=0x7f79b8007e80) at rpc-clnt.c:971 conn = 0x5571dcde1380 clnt = 0x5571dcde1350 ret = -1 req_info = 0x0 pollin = 0x7f79b8007e80 clnt_mydata = 0x0 old_THIS = 0x5571dc67b0b0 __FUNCTION__ = "rpc_clnt_notify" #6 0x00007f79dd6e3ea3 in rpc_transport_notify (this=this@entry=0x5571dcde1580, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f79b8007e80) at rpc-transport.c:538 ret = -1 __FUNCTION__ = "rpc_transport_notify" #7 0x00007f79cf5f44c6 in socket_event_poll_in (this=this@entry=0x5571dcde1580, notify_handled=<optimized out>) at socket.c:2315 ret = <optimized out> pollin = <optimized out> priv = 0x5571dcde1fb0 ctx = <optimized out> #8 0x00007f79cf5f6a6c in socket_event_handler (fd=14, idx=5, gen=1, data=0x5571dcde1580, poll_in=1, poll_out=0, poll_err=0) at socket.c:2467 this = 0x5571dcde1580 ---Type <return> to continue, or q <return> to quit--- priv = 0x5571dcde1fb0 ret = 0 ctx = 0x5571dc633010 socket_closed = _gf_false notify_handled = _gf_false __FUNCTION__ = "socket_event_handler" #9 0x00007f79dd97c344 in event_dispatch_epoll_handler (event=0x7f79cd43ee80, event_pool=0x5571dc66aa30) at event-epoll.c:583 handler = 0x7f79cf5f68c0 <socket_event_handler> gen = 1 slot = 0x5571dc6ad6d8 data = 0x5571dcde1580 ret = -1 fd = 14 ev_data = 0x7f79cd43ee84 idx = 5 handled_error_previously = _gf_false #10 event_dispatch_epoll_worker (data=0x5571dcde3f80) at event-epoll.c:659 event = {events = 1, data = {ptr = 0x100000005, fd = 5, u32 = 5, u64 = 4294967301}} ret = <optimized out> ev_data = 0x5571dcde3f80 event_pool = 0x5571dc66aa30 myindex = 1 timetodie = 0 __FUNCTION__ = "event_dispatch_epoll_worker" #11 0x00007f79dc77edd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #12 0x00007f79dc047b3d in clone () from /lib64/libc.so.6 No symbol table info available. Seeing below Warnings in glusterd logs, [2018-02-13 11:39:13.218043] I [MSGID: 106502] [glusterd-handler.c:2834:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend [2018-02-13 11:39:13.246752] I [MSGID: 106493] [glusterd-rpc-ops.c:693:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f4985910-8e61-42f7-951f-f614b32dac46 [2018-02-13 11:39:14.509711] I [MSGID: 106493] [glusterd-rpc-ops.c:693:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: f4985910-8e61-42f7-951f-f614b32dac46 [2018-02-14 05:35:49.442684] W [MSGID: 101095] [xlator.c:147:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.8.4/xlator/features/ganesha.so: cannot open shared object file: No such file or directory [2018-02-14 05:37:33.807570] W [socket.c:595:__socket_rwv] 0-management: readv on 10.70.42.173:24007 failed (No data available) [2018-02-14 05:37:33.812968] I [MSGID: 106004] [glusterd-handler.c:6276:__glusterd_peer_rpc_notify] 0-management: Peer <dhcp42-173.lab.eng.blr.redhat.com> (<07f51608-cc82-45db-9681-f78beac254ad>), in state <Peer in Cluster>, has disconnected from glusterd. [2018-02-14 05:37:33.817654] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x226aa) [0x7f0d1098a6aa] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x2c458) [0x7f0d10994458] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xdccba) [0x7f0d10a44cba] ) 0-management: Lock for vol distrep10 not held [2018-02-14 05:37:33.818250] W [MSGID: 106118] [glusterd-handler.c:6301:__glusterd_peer_rpc_notify] 0-management: Lock not released for distrep10 [2018-02-14 05:37:33.818487] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x226aa) [0x7f0d1098a6aa] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x2c458) [0x7f0d10994458] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xdccba) [0x7f0d10a44cba] ) 0-management: Lock for vol distrep11 not held [2018-02-14 05:37:33.818566] W [MSGID: 106118] [glusterd-handler.c:6301:__glusterd_peer_rpc_notify] 0-management: Lock not released for distrep11 [2018-02-14 05:37:33.818755] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x226aa) [0x7f0d1098a6aa] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x2c458) [0x7f0d10994458] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xdccba) [0x7f0d10a44cba] ) 0-management: Lock for vol distrep12 not held [2018-02-14 05:37:33.818816] W [MSGID: 106118] [glusterd-handler.c:6301:__glusterd_peer_rpc_notify] 0-management: Lock not released for distrep12 [2018-02-14 05:37:33.819061] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x226aa) [0x7f0d1098a6aa] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x2c458) [0x7f0d10994458] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xdccba) [0x7f0d10a44cba] ) 0-management: Lock for vol distrep13 not held [2018-02-14 05:37:33.819126] W [MSGID: 106118] [glusterd-handler.c:6301:__glusterd_peer_rpc_notify] 0-management: Lock not released for distrep13