Bug 1397669

Summary:	glusterd core found due to segmentation fault
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	glusterd	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED DUPLICATE	QA Contact:	Byreddy <bsrirama>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	rhs-bugs, storage-qa-internal, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-11-23 07:55:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Nag Pavan Chilakam 2016-11-23 06:22:40 UTC

Description of problem:
-==================
I see a few glusterd cores in my 6 node cluster.
This was probably with upgrade of build from 3.8.4.-3 to 3.8.4-5

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fc0f2ff00ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
Missing separate debuginfos, use: debuginfo-install glusterfs-server-3.8.4-5.el7rhgs.x86_64
(gdb) bt
#0  0x00007fc0f2ff00ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007fc0f3587c8e in __glusterd_peer_rpc_notify ()
   from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so
#2  0x00007fc0f357e18c in glusterd_big_locked_notify ()
   from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so
#3  0x00007fc0febd38ed in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#4  0x00007fc0febcf883 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#5  0x00007fc0f09f02f2 in socket_event_handler ()
   from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#6  0x00007fc0fee63340 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#7  0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fc0fd5af73d in clone () from /lib64/libc.so.6
(gdb) quit

Comment 2 Nag Pavan Chilakam 2016-11-23 06:35:05 UTC

(gdb) t a a bt

Thread 7 (Thread 0x7fc0ff2e1780 (LWP 25686)):
#0  0x00007fc0fdc6bef7 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007fc0fee637c8 in event_dispatch_epoll () from /lib64/libglusterfs.so.0
#2  0x00007fc0ff2fbae2 in main ()

Thread 6 (Thread 0x7fc0f50d5700 (LWP 25688)):
#0  0x00007fc0f2a9b300 in __do_global_dtors_aux () from /lib64/liblvm2app.so.2.2
#1  0x00007fc0ff0e285a in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#2  0x00007fc0fd4f0a49 in __run_exit_handlers () from /lib64/libc.so.6
#3  0x00007fc0fd4f0a95 in exit () from /lib64/libc.so.6
#4  0x00007fc0ff2feb5e in cleanup_and_exit ()
#5  0x00007fc0ff2fec45 in glusterfs_sigwaiter ()
#6  0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fc0fd5af73d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7fc0ef7e0700 (LWP 25891)):
#0  0x00007fc0fdc6e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fc0f362e2c3 in glusterd_hooks_stub_init ()
   from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so
#2  0x6c40ade6a5c67fce in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7fc0f40d3700 (LWP 25690)):
#0  0x00007fc0fdc6ea82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fc0fee41df8 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7fc0f48d4700 (LWP 25689)):
#0  0x00007fc0fdc6ea82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fc0fee41df8 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7fc0f58d6700 (LWP 25687)):
#0  0x00007fc0fdc71bdd in nanosleep () from /lib64/libpthread.so.0
#1  0x00007fc0fee16c16 in gf_timer_proc () from /lib64/libglusterfs.so.0
#2  0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fc0fd5af73d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fc0eefdf700 (LWP 25892)):
#0  0x00007fc0f2ff00ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007fc0f3587c8e in __glusterd_peer_rpc_notify ()
   from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so
#2  0x00007fc0f357e18c in glusterd_big_locked_notify ()
   from /usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so
#3  0x00007fc0febd38ed in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#4  0x00007fc0febcf883 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#5  0x00007fc0f09f02f2 in socket_event_handler ()
   from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#6  0x00007fc0fee63340 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#7  0x00007fc0fdc6adc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fc0fd5af73d in clone () from /lib64/libc.so.6

Comment 3 Atin Mukherjee 2016-11-23 07:55:36 UTC

Please refer to the RCA - https://bugzilla.redhat.com/show_bug.cgi?id=1238067#c13

In short, Thread 6 initiated a cleanup and exit with a trigger to glusterd shutdown which ended up cleaning all the resources including urcu and then in __glusterd_peer_rpc_notify () while acquiring rcu read lock the same was inaccessible.

Please note this is already marked as known issue in the known issue chapter.

*** This bug has been marked as a duplicate of bug 1238067 ***