Bug 1209461

Summary: BVT: glusterd crashed and dumped during upgrade (on rhel7.1 server)
Product: [Community] GlusterFS Reporter: Apeksha <akhakhar>
Component: glusterdAssignee: Anand Nekkunti <anekkunt>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: mainlineCC: akhakhar, amukherj, anekkunt, bugs, gluster-bugs, kripper, nsathyan, sasundar, vbellur
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1230026 1230195 (view as bug list) Environment:
Last Closed: 2016-06-16 12:47:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1230026, 1230195    
Attachments:
Description Flags
Backtrace of the core . none

Description Apeksha 2015-04-07 12:09:30 UTC
Created attachment 1011735 [details]
Backtrace of the core .

Description of problem:
glusterd has crashed and dumped core.
Core was generated by `glusterd --xlator-option *.upgrade=on -N'
backtrace of the core:

(gdb) bt
#0  0x00007f5cacc66c3b in rcu_bp_register () from /lib64/liburcu-bp.so.1
#1  0x00007f5cacc66f7e in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#2  0x00007f5cad52d5d6 in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x7f5cba0a17a0, mydata=mydata@entry=0x7f5cba0964a0, event=event@entry=RPC_CLNT_CONNECT, data=data@entry=0x0) at glusterd-handler.c:4681
#3  0x00007f5cad5250ec in glusterd_big_locked_notify (rpc=0x7f5cba0a17a0, mydata=0x7f5cba0964a0, event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7f5cad52d580 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:71
#4  0x00007f5cb827e610 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f5cba0a17d0, event=<optimized out>, data=<optimized out>) at rpc-clnt.c:926
#5  0x00007f5cb827a4c3 in rpc_transport_notify (this=this@entry=0x7f5cba0a4930, event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x7f5cba0a4930) at rpc-transport.c:543
#6  0x00007f5caad75a27 in socket_connect_finish (this=this@entry=0x7f5cba0a4930) at socket.c:2366
#7  0x00007f5caad7af7f in socket_event_handler (fd=fd@entry=11, idx=idx@entry=2, data=0x7f5cba0a4930, poll_in=0, poll_out=4, poll_err=0) at socket.c:2396
#8  0x00007f5cb8504c1a in event_dispatch_epoll_handler (event=0x7f5ca8db1ec0, event_pool=0x7f5cba033c50) at event-epoll.c:572
#9  event_dispatch_epoll_worker (data=0x7f5cba0425b0) at event-epoll.c:674
#10 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f5cb71661ad in clone () from /lib64/libc.so.6


Version-Release number of selected component (if applicable):
Upstream glusterfs3.7 on Rhel7.1 server

How reproducible:


Steps to Reproduce:
1. Not a manual process
2. Running BVT on rhel7.1 server with upstream glusterfs3.7 packages
3. Recieved a core from one of the servers, when fssanity tests failed after watchdog timeout.


Actual results:


Expected results: Server not to crash.


Additional info:

Comment 4 Kaushal 2015-04-08 09:55:59 UTC
The crash is caused due to race between exit and a socket event.
```
(gdb) thr a a bt

Thread 7 (Thread 0x7f5ca95b3700 (LWP 13303)):
#0  0x00007f5cb7823705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5cad5af353 in hooks_worker (args=<optimized out>) at glusterd-hooks.c:501
#2  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f5caf040700 (LWP 13151)):
#0  0x00007f5cab879859 in __do_global_dtors_aux () from /lib64/libselinux.so.1
#1  0x00007f5cb8744b5a in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#2  0x00007f5cb70a8e49 in __run_exit_handlers () from /lib64/libc.so.6
#3  0x00007f5cb70a8e95 in exit () from /lib64/libc.so.6
#4  0x00007f5cb896253a in cleanup_and_exit (signum=<optimized out>) at glusterfsd.c:1242
#5  0x00007f5cb8962625 in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:1983
#6  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f5caf841700 (LWP 13150)):
#0  0x00007f5cb782699d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f5cb84c7224 in gf_timer_proc (ctx=0x7f5cba015010) at timer.c:191
#2  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f5cae83f700 (LWP 13152)):
#0  0x00007f5cb7823ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5cb84e9028 in syncenv_task (proc=proc@entry=0x7f5cba042a20) at syncop.c:591
#2  0x00007f5cb84e9c90 in syncenv_processor (thdata=0x7f5cba042a20) at syncop.c:683
#3  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5cae03e700 (LWP 13153)):
#0  0x00007f5cb7823ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5cb84e9028 in syncenv_task (proc=proc@entry=0x7f5cba042de0) at syncop.c:591
#2  0x00007f5cb84e9c90 in syncenv_processor (thdata=0x7f5cba042de0) at syncop.c:683
#3  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f5cb8946740 (LWP 13149)):
#0  0x00007f5cb7820f27 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f5cb8504f25 in event_dispatch_epoll (event_pool=0x7f5cba033c50) at event-epoll.c:759
#2  0x00007f5cb895f61a in main (argc=4, argv=0x7fff8e7dc2b8) at glusterfsd.c:2313

Thread 1 (Thread 0x7f5ca8db2700 (LWP 13304)):
#0  0x00007f5cacc66c3b in rcu_bp_register () from /lib64/liburcu-bp.so.1
#1  0x00007f5cacc66f7e in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#2  0x00007f5cad52d5d6 in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x7f5cba0a17a0, mydata=mydata@entry=0x7f5cba0964a0, event=event@entry=RPC_CLNT_CONNECT, data=data@entry=0x0) at glusterd-handler.c:4681
#3  0x00007f5cad5250ec in glusterd_big_locked_notify (rpc=0x7f5cba0a17a0, mydata=0x7f5cba0964a0, event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7f5cad52d580 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:71
#4  0x00007f5cb827e610 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f5cba0a17d0, event=<optimized out>, data=<optimized out>) at rpc-clnt.c:926
#5  0x00007f5cb827a4c3 in rpc_transport_notify (this=this@entry=0x7f5cba0a4930, event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x7f5cba0a4930) at rpc-transport.c:543
#6  0x00007f5caad75a27 in socket_connect_finish (this=this@entry=0x7f5cba0a4930) at socket.c:2366
#7  0x00007f5caad7af7f in socket_event_handler (fd=fd@entry=11, idx=idx@entry=2, data=0x7f5cba0a4930, poll_in=0, poll_out=4, poll_err=0) at socket.c:2396
#8  0x00007f5cb8504c1a in event_dispatch_epoll_handler (event=0x7f5ca8db1ec0, event_pool=0x7f5cba033c50) at event-epoll.c:572
#9  event_dispatch_epoll_worker (data=0x7f5cba0425b0) at event-epoll.c:674
#10 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f5cb71661ad in clone () from /lib64/libc.so.6
(gdb)
```

As we can observe, thread 6 is in the process of exiting the process. It has already run the exit handlers, which cleanup things that require cleaning up. This includes liburcu resources. By the time thread 1 calls rcu_bp_register(), the liburcu resources have been cleaned up. rcu_bp_register tries to access these non-existent resources, which leads to the segmentation fault.

Races like this are hard to fix. As this race and crash happen when the process is almost at the point of stopping, it doesn't have any serious impact to functionality apart from the core file and the log message. I'm setting a lower priority for this bug.

Comment 5 Kaushal 2015-04-08 10:23:43 UTC
The fix should be simple enough. GlusterD's fini() doesn't stop tcp socket listener. If it did, the above situation wouldn't arise as the listener would have been stopped before the liburcu exit handler was called.

Comment 6 Anand Avati 2015-04-10 11:42:37 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: This patch stops tcp/ip listeners during  glusterd exit.) posted (#1) for review on master by Anand Nekkunti (anekkunt)

Comment 7 Anand Avati 2015-04-13 09:03:52 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#2) for review on master by Anand Nekkunti (anekkunt)

Comment 8 Anand Avati 2015-04-16 09:18:47 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#3) for review on master by Anand Nekkunti (anekkunt)

Comment 9 Anand Avati 2015-04-24 11:14:46 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit.) posted (#4) for review on master by Anand Nekkunti (anekkunt)

Comment 10 Anand Avati 2015-05-12 09:31:12 UTC
REVIEW: http://review.gluster.org/10758 (ligglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#1) for review on master by Anand Nekkunti (anekkunt)

Comment 11 Anand Avati 2015-05-12 09:44:28 UTC
REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#2) for review on master by Anand Nekkunti (anekkunt)

Comment 12 Anand Avati 2015-05-12 18:39:11 UTC
REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#3) for review on master by Anand Nekkunti (anekkunt)

Comment 13 Anand Avati 2015-05-13 11:01:22 UTC
REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#4) for review on master by Anand Nekkunti (anekkunt)

Comment 14 Anand Avati 2015-05-14 19:00:23 UTC
REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#5) for review on master by Anand Nekkunti (anekkunt)

Comment 15 Anand Avati 2015-05-14 19:23:41 UTC
REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#6) for review on master by Anand Nekkunti (anekkunt)

Comment 16 Anand Avati 2015-05-20 09:05:47 UTC
REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#7) for review on master by Anand Nekkunti (anekkunt)

Comment 17 Anand Avati 2015-05-22 03:55:47 UTC
REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#9) for review on master by Anand Nekkunti (anekkunt)

Comment 18 Anand Avati 2015-05-22 06:14:01 UTC
REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#10) for review on master by Anand Nekkunti (anekkunt)

Comment 19 Anand Avati 2015-05-22 13:49:15 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#6) for review on master by Anand Nekkunti (anekkunt)

Comment 20 Anand Avati 2015-05-22 14:24:10 UTC
REVIEW: http://review.gluster.org/10894 (bglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#2) for review on master by Anand Nekkunti (anekkunt)

Comment 21 Anand Avati 2015-05-22 16:46:01 UTC
REVIEW: http://review.gluster.org/10894 (bglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#3) for review on master by Anand Nekkunti (anekkunt)

Comment 22 Anand Avati 2015-05-23 11:10:11 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#7) for review on master by Anand Nekkunti (anekkunt)

Comment 23 Anand Avati 2015-05-23 11:17:57 UTC
REVIEW: http://review.gluster.org/10894 (bglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#4) for review on master by Anand Nekkunti (anekkunt)

Comment 24 Anand Avati 2015-05-23 11:18:00 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit.) posted (#8) for review on master by Anand Nekkunti (anekkunt)

Comment 25 Anand Avati 2015-05-23 11:42:48 UTC
REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#5) for review on master by Anand Nekkunti (anekkunt)

Comment 26 Anand Avati 2015-05-23 11:42:51 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#9) for review on master by Anand Nekkunti (anekkunt)

Comment 27 Anand Avati 2015-05-23 17:00:22 UTC
REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#6) for review on master by Anand Nekkunti (anekkunt)

Comment 28 Anand Avati 2015-05-24 09:39:58 UTC
REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#7) for review on master by Anand Nekkunti (anekkunt)

Comment 29 Anand Avati 2015-05-24 09:40:00 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#10) for review on master by Anand Nekkunti (anekkunt)

Comment 30 Christopher Pereira 2015-05-25 21:33:03 UTC
*** Bug 1220623 has been marked as a duplicate of this bug. ***

Comment 31 Anand Avati 2015-05-28 04:46:50 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#11) for review on master by Atin Mukherjee (amukherj)

Comment 32 Anand Avati 2015-05-28 05:21:30 UTC
REVIEW: http://review.gluster.org/10894 (glusterfsd: Enabling the fini()  in cleanup_and_exit()) posted (#8) for review on master by Kaushal M (kaushal)

Comment 33 Anand Avati 2015-05-28 05:22:01 UTC
REVIEW: http://review.gluster.org/10894 (glusterfsd: Enabling the fini()  in cleanup_and_exit()) posted (#9) for review on master by Kaushal M (kaushal)

Comment 34 Anand Avati 2015-05-29 01:13:38 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#12) for review on master by Krishnan Parthasarathi (kparthas)

Comment 35 Anand Avati 2015-05-30 04:25:05 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#13) for review on master by Anand Nekkunti (anekkunt)

Comment 36 Anand Avati 2015-06-01 06:15:28 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#14) for review on master by Anand Nekkunti (anekkunt)

Comment 37 Anand Avati 2015-06-01 08:37:28 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#15) for review on master by Anand Nekkunti (anekkunt)

Comment 38 Anand Avati 2015-06-02 03:35:10 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#16) for review on master by Anand Nekkunti (anekkunt)

Comment 39 Anand Avati 2015-06-03 08:37:42 UTC
REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini()  in cleanup_and_exit()) posted (#10) for review on master by Anand Nekkunti (anekkunt)

Comment 40 Anand Avati 2015-06-03 08:37:45 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#17) for review on master by Anand Nekkunti (anekkunt)

Comment 41 Anand Avati 2015-06-03 08:43:02 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#18) for review on master by Anand Nekkunti (anekkunt)

Comment 42 Anand Avati 2015-06-03 08:56:27 UTC
REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during  glusterd exit) posted (#19) for review on master by Anand Nekkunti (anekkunt)

Comment 43 Nagaprasad Sathyanarayana 2015-10-25 14:59:06 UTC
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.

Comment 44 Niels de Vos 2016-06-16 12:47:47 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user