Bug 1653742 - glusterd crashed while running volume status detail continuosly from node N1 and restart glusterd on N2/N3
Summary: glusterd crashed while running volume status detail continuosly from node N1 ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: RHGS 3.4.z Batch Update 4
Assignee: Atin Mukherjee
QA Contact: Bala Konda Reddy M
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-27 14:25 UTC by Bala Konda Reddy M
Modified: 2019-03-27 03:44 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.12.2-41
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-27 03:43:39 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1654161 0 medium CLOSED glusterd crashed with seg fault possibly during node reboot while volume creates and deletes were happening 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2019:0658 0 None None None 2019-03-27 03:44:55 UTC

Internal Links: 1654161

Description Bala Konda Reddy M 2018-11-27 14:25:33 UTC
Description of problem:
On a three nodes cluster (N1,N2,N3), On N1 continuosly exectuing  "gluster vol status rep3_3 detail" restarted glusterd on one of the nodes from(N2, N3) not sure on which node glusterd is restarted
glusterd core dumps on N2 and N3.


Version-Release number of selected component (if applicable):
glusterfs-3.12.2-27.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
1. Form three nodes cluster and brick-mux enabled
2. Created and started three replica(1X3) volumes
3. Next just created 300 volumes not started of type replicate(1X3)
4. Executed "gluster vol status rep3_3 detail" to check for memory leaks
5. restart glusterd on one of the nodes N2/N3 (not sure on which node glusterd is restarted)

Actual results:
glusterd core dumps on two nodes

Node 2 bt and t a a bt

####################################################################################################################
t a a bt

warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/16/3c2dc43405427478788bad0afd537a7acf7a13
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f04e815f0ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 device-mapper-event-libs-1.02.149-10.el7_6.2.x86_64 device-mapper-libs-1.02.149-10.el7_6.2.x86_64 elfutils-libelf-0.172-2.el7.x86_64 elfutils-libs-0.172-2.el7.x86_64 glibc-2.17-260.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-34.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-59.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libsepol-2.5-10.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 lvm2-libs-2.02.180-10.el7_6.2.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.2-13.el7.x86_64 systemd-libs-219-62.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) t a a bt

Thread 9 (Thread 0x7f04eaa67700 (LWP 25849)):
#0  0x00007f04e6f03410 in dm_get_suspended_counter@plt () from /lib64/libdevmapper.so.1.02
#1  0x00007f04e6f0382a in dm_lib_exit () from /lib64/libdevmapper.so.1.02
#2  0x00007f04f3f4efca in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#3  0x00007f04f22ccb69 in __run_exit_handlers () from /lib64/libc.so.6
#4  0x00007f04f22ccbb7 in exit () from /lib64/libc.so.6
#5  0x00005616d710447f in cleanup_and_exit (signum=15) at glusterfsd.c:1423
#6  0x00005616d7104575 in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2145
#7  0x00007f04f2ac8dd5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f04f2390ead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f04eb268700 (LWP 25848)):
#0  0x00007f04f2acfe3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f04f3c77c96 in gf_timer_proc (data=0x5616d74af270) at timer.c:174
#2  0x00007f04f2ac8dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f04f2390ead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f04f414d780 (LWP 25847)):
#0  0x00007f04f2ac9f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f04f3cc7e78 in event_dispatch_epoll (event_pool=0x5616d74a7a30) at event-epoll.c:746
#2  0x00005616d7101247 in main (argc=5, argv=<optimized out>) at glusterfsd.c:2550

Thread 6 (Thread 0x7f04e3d86700 (LWP 26041)):
#0  0x00007f04f2acc965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f04e87b19bb in hooks_worker (args=<optimized out>) at glusterd-hooks.c:529
#2  0x00007f04f2ac8dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f04f2390ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f04e2984700 (LWP 17711)):
#0  0x00007f04f238b1c9 in syscall () from /lib64/libc.so.6
#1  0x00007f04e815ec14 in call_rcu_thread () from /lib64/liburcu-bp.so.1
#2  0x00007f04f2ac8dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f04f2390ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f04e3585700 (LWP 26042)):
#0  0x00007f04f2391483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f04f3cc7712 in event_dispatch_epoll_worker (data=0x5616d7a79aa0) at event-epoll.c:649
#2  0x00007f04f2ac8dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f04f2390ead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f04e9a65700 (LWP 25851)):
---Type <return> to continue, or q <return> to quit---
#0  0x00007f04f2accd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f04f3ca5178 in syncenv_task (proc=proc@entry=0x5616d74afa90) at syncop.c:603
#2  0x00007f04f3ca6040 in syncenv_processor (thdata=0x5616d74afa90) at syncop.c:695
#3  0x00007f04f2ac8dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f04f2390ead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f04ea266700 (LWP 25850)):
#0  0x00007f04f2357e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f04f2357cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f04f3c9250d in pool_sweeper (arg=<optimized out>) at mem-pool.c:481
#3  0x00007f04f2ac8dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f04f2390ead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f04e9264700 (LWP 25852)):
#0  0x00007f04e815f0ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007f04e87e3aa7 in glusterd_peerinfo_find_by_uuid (
    uuid=uuid@entry=0x7f04d5021580 "\030.\272\310$\257K\366\231=\bo\251\347\022;>:S\240p\353G\355\275\322\335q\254\250\315\374\022") at glusterd-peer-utils.c:193
#2  0x00007f04e87da510 in glusterd_handle_mgmt_v3_lock_fn (req=req@entry=0x7f04d4b74ad0) at glusterd-mgmt-handler.c:157
#3  0x00007f04e86f0b7e in glusterd_big_locked_handler (req=0x7f04d4b74ad0, actor_fn=0x7f04e87da430 <glusterd_handle_mgmt_v3_lock_fn>) at glusterd-handler.c:82
#4  0x00007f04f3ca2ba0 in synctask_wrap () at syncop.c:375
#5  0x00007f04f22db010 in ?? () from /lib64/libc.so.6
#6  0x0000000000000000 in ?? ()
(gdb) 
##############################################################################################################################################################

(gdb) bt
#0  0x00007f04e815f0ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007f04e87e3aa7 in glusterd_peerinfo_find_by_uuid (
    uuid=uuid@entry=0x7f04d5021580 "\030.\272\310$\257K\366\231=\bo\251\347\022;>:S\240p\353G\355\275\322\335q\254\250\315\374\022") at glusterd-peer-utils.c:193
#2  0x00007f04e87da510 in glusterd_handle_mgmt_v3_lock_fn (req=req@entry=0x7f04d4b74ad0) at glusterd-mgmt-handler.c:157
#3  0x00007f04e86f0b7e in glusterd_big_locked_handler (req=0x7f04d4b74ad0, actor_fn=0x7f04e87da430 <glusterd_handle_mgmt_v3_lock_fn>) at glusterd-handler.c:82
#4  0x00007f04f3ca2ba0 in synctask_wrap () at syncop.c:375
#5  0x00007f04f22db010 in ?? () from /lib64/libc.so.6
#6  0x0000000000000000 in ?? ()

################################################################################3
[2018-11-26 07:09:31.325762] W [MSGID: 106118] [glusterd-handler.c:6458:__glusterd_peer_rpc_notify] 0-management: Lock not released for testvol_99
[2018-11-26 07:09:37.286847] W [glusterfsd.c:1367:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f04f2ac8dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x5616d7104575] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x5616d71043eb] ) 0-: received signum (15), shutting down
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2018-11-26 07:09:37
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7f04f3c69dfd]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f04f3c73ec4]
/lib64/libc.so.6(+0x36280)[0x7f04f22c9280]
/lib64/liburcu-bp.so.1(rcu_read_lock_bp+0x2d)[0x7f04e815f0ad]
/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x116aa7)[0x7f04e87e3aa7]
/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x10d510)[0x7f04e87da510]
/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x23b7e)[0x7f04e86f0b7e]
/lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7f04f3ca2ba0]
/lib64/libc.so.6(+0x48010)[0x7f04f22db010]
---------
#######################################################################

Node 3 core dump
#############################################################################
t a a bt output
warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/16/3c2dc43405427478788bad0afd537a7acf7a13
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f034298b0ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 device-mapper-event-libs-1.02.149-10.el7_6.2.x86_64 device-mapper-libs-1.02.149-10.el7_6.2.x86_64 elfutils-libelf-0.172-2.el7.x86_64 elfutils-libs-0.172-2.el7.x86_64 glibc-2.17-260.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-34.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-59.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libsepol-2.5-10.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 lvm2-libs-2.02.180-10.el7_6.2.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.2-13.el7.x86_64 systemd-libs-219-62.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) t a a bt

Thread 9 (Thread 0x7f033df32700 (LWP 4870)):
#0  0x00007f034cbbd483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f034e4f3712 in event_dispatch_epoll_worker (data=0x55f8535cba30)
    at event-epoll.c:649
#2  0x00007f034d2f4dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f034cbbcead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f033e733700 (LWP 4869)):
#0  0x00007f034d2f8965 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x00007f0342fdd9bb in hooks_worker (args=<optimized out>) at glusterd-hooks.c:529
#2  0x00007f034d2f4dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f034cbbcead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f0344291700 (LWP 4630)):
#0  0x00007f034d2f8d12 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x00007f034e4d1178 in syncenv_task (proc=proc@entry=0x55f853430a90)
---Type <return> to continue, or q <return> to quit---
    at syncop.c:603
#2  0x00007f034e4d2040 in syncenv_processor (thdata=0x55f853430a90) at syncop.c:695
#3  0x00007f034d2f4dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f034cbbcead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f0345293700 (LWP 4628)):
#0  0x00007f0341983880 in __do_global_dtors_aux () from /lib64/libblkid.so.1
#1  0x00007f034e77afca in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#2  0x00007f034caf8b69 in __run_exit_handlers () from /lib64/libc.so.6
#3  0x00007f034caf8bb7 in exit () from /lib64/libc.so.6
#4  0x000055f8518ca47f in cleanup_and_exit (signum=15) at glusterfsd.c:1423
#5  0x000055f8518ca575 in glusterfs_sigwaiter (arg=<optimized out>)
    at glusterfsd.c:2145
#6  0x00007f034d2f4dd5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f034cbbcead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f0344a92700 (LWP 4629)):
#0  0x00007f034cb83e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f034cb83cc4 in sleep () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
#2  0x00007f034e4be50d in pool_sweeper (arg=<optimized out>) at mem-pool.c:481
#3  0x00007f034d2f4dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f034cbbcead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f033d5b0700 (LWP 28689)):
#0  0x00007f034cbb71c9 in syscall () from /lib64/libc.so.6
#1  0x00007f034298ac14 in call_rcu_thread () from /lib64/liburcu-bp.so.1
#2  0x00007f034d2f4dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f034cbbcead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f034e979780 (LWP 4626)):
#0  0x00007f034d2f5f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f034e4f3e78 in event_dispatch_epoll (event_pool=0x55f853428a30)
    at event-epoll.c:746
#2  0x000055f8518c7247 in main (argc=5, argv=<optimized out>) at glusterfsd.c:2550

Thread 2 (Thread 0x7f0345a94700 (LWP 4627)):
#0  0x00007f034d2fbe3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f034e4a3c96 in gf_timer_proc (data=0x55f853430270) at timer.c:174
---Type <return> to continue, or q <return> to quit---
#2  0x00007f034d2f4dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f034cbbcead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f0343a90700 (LWP 4631)):
#0  0x00007f034298b0ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007f0342f14a08 in __glusterd_handle_stage_op (req=req@entry=0x7f0330060bb0)
    at glusterd-handler.c:1062
#2  0x00007f0342f1cb7e in glusterd_big_locked_handler (req=0x7f0330060bb0, 
    actor_fn=0x7f0342f14870 <__glusterd_handle_stage_op>) at glusterd-handler.c:82
#3  0x00007f034e4ceba0 in synctask_wrap () at syncop.c:375
#4  0x00007f034cb07010 in ?? () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()
(gdb) 
#####################################################################################################
(gdb) bt
#0  0x00007f034298b0ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007f0342f14a08 in __glusterd_handle_stage_op (req=req@entry=0x7f0330060bb0)
    at glusterd-handler.c:1062
#2  0x00007f0342f1cb7e in glusterd_big_locked_handler (req=0x7f0330060bb0, 
    actor_fn=0x7f0342f14870 <__glusterd_handle_stage_op>) at glusterd-handler.c:82
#3  0x00007f034e4ceba0 in synctask_wrap () at syncop.c:375
#4  0x00007f034cb07010 in ?? () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()
(gdb) q

##########################################################################################3
[2018-11-26 06:21:50.100207] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe49fa) [0x7f0342fdd9fa] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe44bd) [0x7f0342fdd4bd] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f034e4e5225] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=rep3_2 --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2018-11-26 07:09:26.653582] W [glusterfsd.c:1367:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f034d2f4dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x55f8518ca575] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55f8518ca3eb] ) 0-: received signum (15), shutting down
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2018-11-26 07:09:26
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7f034e495dfd]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f034e49fec4]
/lib64/libc.so.6(+0x36280)[0x7f034caf5280]
/lib64/liburcu-bp.so.1(rcu_read_lock_bp+0x2d)[0x7f034298b0ad]
/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1ba08)[0x7f0342f14a08]
/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x23b7e)[0x7f0342f1cb7e]
/lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7f034e4ceba0]
/lib64/libc.so.6(+0x48010)[0x7f034cb07010]
---------

Expected results:
No crash/core should be generated

Additional info:

Comment 5 Sanju 2018-11-28 11:31:31 UTC
upstream patch: https://review.gluster.org/#/c/glusterfs/+/21743

Comment 17 errata-xmlrpc 2019-03-27 03:43:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0658


Note You need to log in before you can comment on or make changes to this bug.