Bug 1397669

Summary: glusterd core found due to segmentation fault
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED DUPLICATE QA Contact: Byreddy <bsrirama>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: rhs-bugs, storage-qa-internal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-23 07:55:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nag Pavan Chilakam 2016-11-23 06:22:40 UTC
Description of problem:
-==================
I see a few glusterd cores in my 6 node cluster.
This was probably with upgrade of build from 3.8.4.-3 to 3.8.4-5

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fc0f2ff00ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
Missing separate debuginfos, use: debuginfo-install glusterfs-server-3.8.4-5.el7rhgs.x86_64
(gdb) bt
#0  0x00007fc0f2ff00ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007fc0f3587c8e in __glusterd_peer_rpc_notify ()
   from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so
#2  0x00007fc0f357e18c in glusterd_big_locked_notify ()
   from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so
#3  0x00007fc0febd38ed in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#4  0x00007fc0febcf883 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#5  0x00007fc0f09f02f2 in socket_event_handler ()
   from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#6  0x00007fc0fee63340 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#7  0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fc0fd5af73d in clone () from /lib64/libc.so.6
(gdb) quit

Comment 2 Nag Pavan Chilakam 2016-11-23 06:35:05 UTC
(gdb) t a a bt

Thread 7 (Thread 0x7fc0ff2e1780 (LWP 25686)):
#0  0x00007fc0fdc6bef7 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007fc0fee637c8 in event_dispatch_epoll () from /lib64/libglusterfs.so.0
#2  0x00007fc0ff2fbae2 in main ()

Thread 6 (Thread 0x7fc0f50d5700 (LWP 25688)):
#0  0x00007fc0f2a9b300 in __do_global_dtors_aux () from /lib64/liblvm2app.so.2.2
#1  0x00007fc0ff0e285a in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#2  0x00007fc0fd4f0a49 in __run_exit_handlers () from /lib64/libc.so.6
#3  0x00007fc0fd4f0a95 in exit () from /lib64/libc.so.6
#4  0x00007fc0ff2feb5e in cleanup_and_exit ()
#5  0x00007fc0ff2fec45 in glusterfs_sigwaiter ()
#6  0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fc0fd5af73d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7fc0ef7e0700 (LWP 25891)):
#0  0x00007fc0fdc6e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fc0f362e2c3 in glusterd_hooks_stub_init ()
   from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so
#2  0x6c40ade6a5c67fce in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7fc0f40d3700 (LWP 25690)):
#0  0x00007fc0fdc6ea82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fc0fee41df8 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7fc0f48d4700 (LWP 25689)):
#0  0x00007fc0fdc6ea82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fc0fee41df8 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7fc0f58d6700 (LWP 25687)):
#0  0x00007fc0fdc71bdd in nanosleep () from /lib64/libpthread.so.0
#1  0x00007fc0fee16c16 in gf_timer_proc () from /lib64/libglusterfs.so.0
#2  0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fc0fd5af73d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fc0eefdf700 (LWP 25892)):
#0  0x00007fc0f2ff00ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007fc0f3587c8e in __glusterd_peer_rpc_notify ()
   from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so
#2  0x00007fc0f357e18c in glusterd_big_locked_notify ()
   from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so
#3  0x00007fc0febd38ed in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#4  0x00007fc0febcf883 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#5  0x00007fc0f09f02f2 in socket_event_handler ()
   from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#6  0x00007fc0fee63340 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#7  0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fc0fd5af73d in clone () from /lib64/libc.so.6

Comment 3 Atin Mukherjee 2016-11-23 07:55:36 UTC
Please refer to the RCA - https://bugzilla.redhat.com/show_bug.cgi?id=1238067#c13

In short, Thread 6 initiated a cleanup and exit with a trigger to glusterd shutdown which ended up cleaning all the resources including urcu and then in __glusterd_peer_rpc_notify () while acquiring rcu read lock the same was inaccessible.

Please note this is already marked as known issue in the known issue chapter.

*** This bug has been marked as a duplicate of bug 1238067 ***