Bug 1209461
Summary: | BVT: glusterd crashed and dumped during upgrade (on rhel7.1 server) | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Apeksha <akhakhar> | ||||
Component: | glusterd | Assignee: | Anand Nekkunti <anekkunt> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | mainline | CC: | akhakhar, amukherj, anekkunt, bugs, gluster-bugs, kripper, nsathyan, sasundar, vbellur | ||||
Target Milestone: | --- | Keywords: | Reopened, Triaged | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.8rc2 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1230026 1230195 (view as bug list) | Environment: | |||||
Last Closed: | 2016-06-16 12:47:47 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1230026, 1230195 | ||||||
Attachments: |
|
The crash is caused due to race between exit and a socket event. ``` (gdb) thr a a bt Thread 7 (Thread 0x7f5ca95b3700 (LWP 13303)): #0 0x00007f5cb7823705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5cad5af353 in hooks_worker (args=<optimized out>) at glusterd-hooks.c:501 #2 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f5cb71661ad in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f5caf040700 (LWP 13151)): #0 0x00007f5cab879859 in __do_global_dtors_aux () from /lib64/libselinux.so.1 #1 0x00007f5cb8744b5a in _dl_fini () from /lib64/ld-linux-x86-64.so.2 #2 0x00007f5cb70a8e49 in __run_exit_handlers () from /lib64/libc.so.6 #3 0x00007f5cb70a8e95 in exit () from /lib64/libc.so.6 #4 0x00007f5cb896253a in cleanup_and_exit (signum=<optimized out>) at glusterfsd.c:1242 #5 0x00007f5cb8962625 in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:1983 #6 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f5cb71661ad in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f5caf841700 (LWP 13150)): #0 0x00007f5cb782699d in nanosleep () from /lib64/libpthread.so.0 #1 0x00007f5cb84c7224 in gf_timer_proc (ctx=0x7f5cba015010) at timer.c:191 #2 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f5cb71661ad in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f5cae83f700 (LWP 13152)): #0 0x00007f5cb7823ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5cb84e9028 in syncenv_task (proc=proc@entry=0x7f5cba042a20) at syncop.c:591 #2 0x00007f5cb84e9c90 in syncenv_processor (thdata=0x7f5cba042a20) at syncop.c:683 #3 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f5cb71661ad in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f5cae03e700 (LWP 13153)): #0 0x00007f5cb7823ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5cb84e9028 in syncenv_task (proc=proc@entry=0x7f5cba042de0) at syncop.c:591 #2 0x00007f5cb84e9c90 in syncenv_processor (thdata=0x7f5cba042de0) at syncop.c:683 #3 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f5cb71661ad in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f5cb8946740 (LWP 13149)): #0 0x00007f5cb7820f27 in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f5cb8504f25 in event_dispatch_epoll (event_pool=0x7f5cba033c50) at event-epoll.c:759 #2 0x00007f5cb895f61a in main (argc=4, argv=0x7fff8e7dc2b8) at glusterfsd.c:2313 Thread 1 (Thread 0x7f5ca8db2700 (LWP 13304)): #0 0x00007f5cacc66c3b in rcu_bp_register () from /lib64/liburcu-bp.so.1 #1 0x00007f5cacc66f7e in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1 #2 0x00007f5cad52d5d6 in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x7f5cba0a17a0, mydata=mydata@entry=0x7f5cba0964a0, event=event@entry=RPC_CLNT_CONNECT, data=data@entry=0x0) at glusterd-handler.c:4681 #3 0x00007f5cad5250ec in glusterd_big_locked_notify (rpc=0x7f5cba0a17a0, mydata=0x7f5cba0964a0, event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7f5cad52d580 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:71 #4 0x00007f5cb827e610 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f5cba0a17d0, event=<optimized out>, data=<optimized out>) at rpc-clnt.c:926 #5 0x00007f5cb827a4c3 in rpc_transport_notify (this=this@entry=0x7f5cba0a4930, event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x7f5cba0a4930) at rpc-transport.c:543 #6 0x00007f5caad75a27 in socket_connect_finish (this=this@entry=0x7f5cba0a4930) at socket.c:2366 #7 0x00007f5caad7af7f in socket_event_handler (fd=fd@entry=11, idx=idx@entry=2, data=0x7f5cba0a4930, poll_in=0, poll_out=4, poll_err=0) at socket.c:2396 #8 0x00007f5cb8504c1a in event_dispatch_epoll_handler (event=0x7f5ca8db1ec0, event_pool=0x7f5cba033c50) at event-epoll.c:572 #9 event_dispatch_epoll_worker (data=0x7f5cba0425b0) at event-epoll.c:674 #10 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f5cb71661ad in clone () from /lib64/libc.so.6 (gdb) ``` As we can observe, thread 6 is in the process of exiting the process. It has already run the exit handlers, which cleanup things that require cleaning up. This includes liburcu resources. By the time thread 1 calls rcu_bp_register(), the liburcu resources have been cleaned up. rcu_bp_register tries to access these non-existent resources, which leads to the segmentation fault. Races like this are hard to fix. As this race and crash happen when the process is almost at the point of stopping, it doesn't have any serious impact to functionality apart from the core file and the log message. I'm setting a lower priority for this bug. The fix should be simple enough. GlusterD's fini() doesn't stop tcp socket listener. If it did, the above situation wouldn't arise as the listener would have been stopped before the liburcu exit handler was called. REVIEW: http://review.gluster.org/10197 (glusterd: This patch stops tcp/ip listeners during glusterd exit.) posted (#1) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#2) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#3) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit.) posted (#4) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10758 (ligglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#1) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#2) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#3) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#4) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#5) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#6) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#7) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#9) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#10) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#6) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10894 (bglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#2) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10894 (bglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#3) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#7) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10894 (bglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#4) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit.) posted (#8) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#5) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#9) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#6) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#7) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#10) for review on master by Anand Nekkunti (anekkunt) *** Bug 1220623 has been marked as a duplicate of this bug. *** REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#11) for review on master by Atin Mukherjee (amukherj) REVIEW: http://review.gluster.org/10894 (glusterfsd: Enabling the fini() in cleanup_and_exit()) posted (#8) for review on master by Kaushal M (kaushal) REVIEW: http://review.gluster.org/10894 (glusterfsd: Enabling the fini() in cleanup_and_exit()) posted (#9) for review on master by Kaushal M (kaushal) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#12) for review on master by Krishnan Parthasarathi (kparthas) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#13) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#14) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#15) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#16) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini() in cleanup_and_exit()) posted (#10) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#17) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#18) for review on master by Anand Nekkunti (anekkunt) REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during glusterd exit) posted (#19) for review on master by Anand Nekkunti (anekkunt) Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well. This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |
Created attachment 1011735 [details] Backtrace of the core . Description of problem: glusterd has crashed and dumped core. Core was generated by `glusterd --xlator-option *.upgrade=on -N' backtrace of the core: (gdb) bt #0 0x00007f5cacc66c3b in rcu_bp_register () from /lib64/liburcu-bp.so.1 #1 0x00007f5cacc66f7e in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1 #2 0x00007f5cad52d5d6 in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x7f5cba0a17a0, mydata=mydata@entry=0x7f5cba0964a0, event=event@entry=RPC_CLNT_CONNECT, data=data@entry=0x0) at glusterd-handler.c:4681 #3 0x00007f5cad5250ec in glusterd_big_locked_notify (rpc=0x7f5cba0a17a0, mydata=0x7f5cba0964a0, event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7f5cad52d580 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:71 #4 0x00007f5cb827e610 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f5cba0a17d0, event=<optimized out>, data=<optimized out>) at rpc-clnt.c:926 #5 0x00007f5cb827a4c3 in rpc_transport_notify (this=this@entry=0x7f5cba0a4930, event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x7f5cba0a4930) at rpc-transport.c:543 #6 0x00007f5caad75a27 in socket_connect_finish (this=this@entry=0x7f5cba0a4930) at socket.c:2366 #7 0x00007f5caad7af7f in socket_event_handler (fd=fd@entry=11, idx=idx@entry=2, data=0x7f5cba0a4930, poll_in=0, poll_out=4, poll_err=0) at socket.c:2396 #8 0x00007f5cb8504c1a in event_dispatch_epoll_handler (event=0x7f5ca8db1ec0, event_pool=0x7f5cba033c50) at event-epoll.c:572 #9 event_dispatch_epoll_worker (data=0x7f5cba0425b0) at event-epoll.c:674 #10 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f5cb71661ad in clone () from /lib64/libc.so.6 Version-Release number of selected component (if applicable): Upstream glusterfs3.7 on Rhel7.1 server How reproducible: Steps to Reproduce: 1. Not a manual process 2. Running BVT on rhel7.1 server with upstream glusterfs3.7 packages 3. Recieved a core from one of the servers, when fssanity tests failed after watchdog timeout. Actual results: Expected results: Server not to crash. Additional info: