Bug 1725024 - glustershd dumped core with seg fault at afr_has_quorum (shd mux related)
Summary: glustershd dumped core with seg fault at afr_has_quorum (shd mux related)
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Mohammed Rafi KC
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard: shd-multiplexing
: 1725022 (view as bug list)
Depends On:
Blocks: 1529501 1727329
TreeView+ depends on / blocked
 
Reported: 2019-06-28 09:32 UTC by Nag Pavan Chilakam
Modified: 2020-01-20 09:29 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1727329 (view as bug list)
Environment:
Last Closed: 2020-01-20 09:29:38 UTC
Embargoed:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2019-06-28 09:32:43 UTC
Description of problem:
====================
hit this problem along with BZ#1725022  but on a different node and probably different iteration

I was running volume creates and deletes on my brickmux setup, with creation of different type of volumes. I saw after about 10 hrs,  shd crash dumps with below BT



warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/c8/fbb951579a5ccf45f786661b585545f43e4870
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id shd/rep_vnpuvcukvzkhb -p /var/run'.
Program terminated with signal 11, Segmentation fault.
#0  afr_has_quorum (subvols=0x7f94eb2093d0 "", this=this@entry=0x7f94eb0bc070, frame=frame@entry=0x0) at afr-transaction.c:871
871	    up_children_count = AFR_COUNT(subvols, priv->child_count);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-15.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-61.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  afr_has_quorum (subvols=0x7f94eb2093d0 "", this=this@entry=0x7f94eb0bc070, frame=frame@entry=0x0) at afr-transaction.c:871
#1  0x00007fa3bf5b82cd in afr_notify (this=0x7f94eb0bc070, event=6, data=0x7f94eb0a3f30, data2=<optimized out>) at afr-common.c:5586
#2  0x00007fa3bf5b86c9 in notify (this=<optimized out>, event=<optimized out>, data=<optimized out>) at afr.c:42
#3  0x00007fa3cdf682a2 in xlator_notify (xl=0x7f94eb0bc070, event=event@entry=6, data=data@entry=0x7f94eb0a3f30) at xlator.c:692
#4  0x00007fa3ce01fd15 in default_notify (this=this@entry=0x7f94eb0a3f30, event=event@entry=6, data=data@entry=0x0) at defaults.c:3388
#5  0x00007fa3bf7f2469 in client_notify_dispatch (this=this@entry=0x7f94eb0a3f30, event=event@entry=6, data=data@entry=0x0) at client.c:97
#6  0x00007fa3bf7f24ca in client_notify_dispatch_uniq (this=this@entry=0x7f94eb0a3f30, event=event@entry=6, data=data@entry=0x0) at client.c:71
#7  0x00007fa3bf7f348d in client_rpc_notify (rpc=0x7f94eb5d08b0, mydata=0x7f94eb0a3f30, event=<optimized out>, data=<optimized out>) at client.c:2365
#8  0x00007fa3cdd14203 in rpc_clnt_handle_disconnect (conn=0x7f94eb5d08e0, clnt=0x7f94eb5d08b0) at rpc-clnt.c:826
#9  rpc_clnt_notify (trans=0x7f94eb5d0b70, mydata=0x7f94eb5d08e0, event=RPC_TRANSPORT_DISCONNECT, data=<optimized out>) at rpc-clnt.c:887
#10 0x00007fa3cdd10a53 in rpc_transport_notify (this=this@entry=0x7f94eb5d0b70, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x7f94eb5d0b70) at rpc-transport.c:547
#11 0x00007fa3c232a2df in socket_event_poll_err (this=this@entry=0x7f94eb5d0b70, gen=gen@entry=1, idx=idx@entry=4638) at socket.c:1385
#12 0x00007fa3c232c6ea in socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f94eb5d0b70, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=16, event_thread_died=0 '\000')
    at socket.c:3008
#13 0x00007fa3cdfd1416 in event_dispatch_epoll_handler (event=0x7fa3c0277e70, event_pool=0x5587f336f5b0) at event-epoll.c:648
#14 event_dispatch_epoll_worker (data=0x5587f33c1fd0) at event-epoll.c:761
#15 0x00007fa3ccda9ea5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007fa3cc66f8cd in clone () from /lib64/libc.so.6



Version-Release number of selected component (if applicable):
===================
6.0.6 


How reproducible:
===============
was hit thrice on this cluster

Steps to Reproduce:
1.created a 3 node brickmux setup
2. triggered a script which creates 1000 volumes of different types randomly(singlebrick,rep3, distrep3, arb,dist-arb,ecv,dist-ecv)
3. then post that we delete all the volumes
4. then again re-iterate step2,3

for j in {1..100};do echo "########## loop $j #### " |& tee -a volc.log; date |& tee -a volc.log;for i in {1..1000};do python randvol-create.py |& tee -a volc.log;done;for v in $(gluster v list);do gluster v stop $v --mode=script|& tee -a volc.log;date |& tee -a volc.log; gluster v del $v --mode=script|& tee -a volc.log;done;done

Actual results:
================
glustershd crash as above

Comment 5 Mohammed Rafi KC 2019-06-28 14:10:56 UTC
*** Bug 1725022 has been marked as a duplicate of this bug. ***

Comment 8 Nag Pavan Chilakam 2019-07-03 11:06:19 UTC
another crash hit with similar bt, when trying to add-bricks to randomly to any one of the 3 different volumes{1 each of disp,rep3,arb} ...tried this for 20 times...
warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/c8/fbb951579a5ccf45f786661b585545f43e4870
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id shd/arb -p /var/run/gluster/shd/a'.
Program terminated with signal 11, Segmentation fault.
#0  __get_heard_from_all_status (this=this@entry=0x7fb534263680) at afr-common.c:5024
5024	    for (i = 0; i < priv->child_count; i++) {
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-15.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-61.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  __get_heard_from_all_status (this=this@entry=0x7fb534263680) at afr-common.c:5024
#1  0x00007fb5447b6e27 in afr_notify (this=0x7fb534263680, event=6, data=0x7fb53425c080, data2=<optimized out>)
    at afr-common.c:5519
#2  0x00007fb5447b76c9 in notify (this=<optimized out>, event=<optimized out>, data=<optimized out>) at afr.c:42
#3  0x00007fb553167272 in xlator_notify (xl=0x7fb534263680, event=event@entry=6, data=data@entry=0x7fb53425c080)
    at xlator.c:699
#4  0x00007fb55321e1d5 in default_notify (this=this@entry=0x7fb53425c080, event=event@entry=6, data=data@entry=0x0)
    at defaults.c:3388
#5  0x00007fb5449f1469 in client_notify_dispatch (this=this@entry=0x7fb53425c080, event=event@entry=6, 
    data=data@entry=0x0) at client.c:97
#6  0x00007fb5449f14ca in client_notify_dispatch_uniq (this=this@entry=0x7fb53425c080, event=event@entry=6, 
    data=data@entry=0x0) at client.c:71
#7  0x00007fb5449f2465 in client_rpc_notify (rpc=0x7fb5343ecb30, mydata=0x7fb53425c080, event=<optimized out>, 
    data=<optimized out>) at client.c:2373
#8  0x00007fb552f13203 in rpc_clnt_handle_disconnect (conn=0x7fb5343ecb60, clnt=0x7fb5343ecb30) at rpc-clnt.c:826
#9  rpc_clnt_notify (trans=0x7fb5343ecdf0, mydata=0x7fb5343ecb60, event=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-clnt.c:887
#10 0x00007fb552f0fa53 in rpc_transport_notify (this=this@entry=0x7fb5343ecdf0, 
    event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x7fb5343ecdf0) at rpc-transport.c:547
#11 0x00007fb5475292df in socket_event_poll_err (this=this@entry=0x7fb5343ecdf0, gen=gen@entry=1, idx=idx@entry=18)
    at socket.c:1385
#12 0x00007fb54752b6ea in socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, 
    data=0x7fb5343ecdf0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=16, event_thread_died=0 '\000')
    at socket.c:3008
#13 0x00007fb5531cf8d6 in event_dispatch_epoll_handler (event=0x7fb545476130, event_pool=0x56031562e560)
---Type <return> to continue, or q <return> to quit---
    at event-epoll.c:648
#14 event_dispatch_epoll_worker (data=0x56031567eaa0) at event-epoll.c:761
#15 0x00007fb551fa8ea5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007fb55186e8cd in clone () from /lib64/libc.so.6
(gdb)

Comment 9 Nag Pavan Chilakam 2019-07-03 11:08:03 UTC
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-07-03 11:00:11
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 6.0
/lib64/libglusterfs.so.0(+0x27210)[0x7fb55316a210]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fb553174c34]
/lib64/libc.so.6(+0x363f0)[0x7fb5517a63f0]
/usr/lib64/glusterfs/6.0/xlator/cluster/replicate.so(+0x5e447)[0x7fb5447a3447]
/usr/lib64/glusterfs/6.0/xlator/cluster/replicate.so(+0x71e27)[0x7fb5447b6e27]
/usr/lib64/glusterfs/6.0/xlator/cluster/replicate.so(+0x726c9)[0x7fb5447b76c9]
/lib64/libglusterfs.so.0(xlator_notify+0x32)[0x7fb553167272]
/lib64/libglusterfs.so.0(default_notify+0x195)[0x7fb55321e1d5]
/usr/lib64/glusterfs/6.0/xlator/protocol/client.so(+0x14469)[0x7fb5449f1469]
/usr/lib64/glusterfs/6.0/xlator/protocol/client.so(+0x144ca)[0x7fb5449f14ca]
/usr/lib64/glusterfs/6.0/xlator/protocol/client.so(+0x15465)[0x7fb5449f2465]
/lib64/libgfrpc.so.0(+0xf203)[0x7fb552f13203]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fb552f0fa53]
/usr/lib64/glusterfs/6.0/rpc-transport/socket.so(+0x82df)[0x7fb5475292df]
/usr/lib64/glusterfs/6.0/rpc-transport/socket.so(+0xa6ea)[0x7fb54752b6ea]
/lib64/libglusterfs.so.0(+0x8c8d6)[0x7fb5531cf8d6]
/lib64/libpthread.so.0(+0x7ea5)[0x7fb551fa8ea5]
/lib64/libc.so.6(clone+0x6d)[0x7fb55186e8cd]

Comment 10 Mohammed Rafi KC 2019-07-05 15:15:11 UTC
upstream patch: https://review.gluster.org/#/c/glusterfs/+/23006/1


Note You need to log in before you can comment on or make changes to this bug.